How do I use collect keep in parse, to get embedded blocks?

How do I use collect keep in parse, to get embedded blocks? - parsing

Looking at the html example here: http://www.red-lang.org/2013/11/041-introducing-parse.html
I would like to parse the following:
"val1-12*more text-something"
Where:
"-" marks values which should be in the same block, and
"*" should start a new block.
So, I want this:
[ ["val1" "12"] ["more text" "something"] ]
and at the moment I get this:
red>> data: "val1-12*more text-something"
== "val1-12*more text-something"
red>> c: charset reduce ['not #"-" #"*"]
== make bitset! [not #{000000000024}]
red>> parse data [collect [any [keep any c [#"-" | #"*" | end ]]]]
== ["val1" "12" "more text" "something"]
(I actually tried some other permutations, which didn't get me any farther.)
So, what's missing?

You can make it work by nesting COLLECT. For e.g.
keep-pair: [
keep some c
#"-"
keep some c
]
parse data [
collect [
some [
collect [keep-pair]
#"*"
collect [keep-pair]
]
]
]
Using your example input this outputs the result you wanted:
[["val1" "12"] ["more text" "something"]]
However I got funny feeling you maybe wanted the parse rule to be more flexible than the example input provided?

Related

Transforming a string in Ruby w/ `gsub` using an array of regexps

I have a string, that I want to transform using Ruby's gsub and a TON of regexps and their resulting transformations in an array of arrays.
I like to do something like this:
MY_REGEXPS = [
[
/^(\d-\d:) (SINGLE|DOUBLE|TRIPLE)/,
proc { "#{$1} #{$2.capitalize}," }
],
#....Many for regexp/transformation pairs
]
my_string = "0:0 SINGLE (Line Drive, 89XD)"
MY_REGEXPS.inject(my_string) do |str, regexp_pair|
str.gsub(regexp_pair.first, &regexp_pair.last)
end
However, the proc is not bound to the context of the gsub match, so variables like $1 and $2 are not available. I also confirm that if I just use the regexp/transformation in the process of a normal call to gsub, like:
my_string.gsub(/^(\d-\d:) (SINGLE|DOUBLE|TRIPLE)/) do
"#{$1} #{$2.capitalize},"
end
the code works just fine.
Can anyone tell me a way for me to bind that proc to the context of the gsub so I can access $1 and $2?
Thanks

Perhaps the following or a variant would meet your needs.
MY_REGEXPS = [
[
/^(\p{L}+) (\d:\d) (SINGLE|DOUBLE|TRIPLE) \1/i,
proc { |_,v2,v3| "#{v2} #{v3.capitalize}," }
],
]
my_string = "dog 1:2 single dog (Line Drive, 89XD)"
MY_REGEXPS.inject(my_string) do |s,(re,p)|
p.call(*s.match(re).captures)
end
#=> "1:2 Single,"
I've included capture group #1 (\p{L}+) (match one or more letters) to show how a capture group might be included that is not relevant to the proc calculation, but MatchData#captures can still be passed to the proc. (Capture group #1 is used here to ensure that the content of that capture group appears again at the specified location in the string (\1)).

Why do you use %w[] in rails?

Why would you ever use %w[] considering arrays in Rails are type-agnostic?

This is the most efficient way to define array of strings, because you don't have to use quotes and commas.
%w(abc def xyz)
Instead of
['abc', 'def', 'xyz']

Duplicate question of
http://stackoverflow.com/questions/1274675/what-does-warray-mean
http://stackoverflow.com/questions/5475830/what-is-the-w-thing-in-ruby
For more details you can follow https://simpleror.wordpress.com/2009/03/15/q-q-w-w-x-r-s/
These are the types of percent strings in ruby:
%w : Array of Strings
%i : Array of Symbols
%q : String
%r : Regular Expression
%s : Symbol
%x : Backtick (capture subshell result)
Let take some example
you have some set of characters which perform a paragraph like
Thanks for contributing an answer to Stack Overflow!
so when you try with
%w(Thanks for contributing an answer to Stack Overflow!)
Then you will get the output like
=> ["Thanks", "for", "contributing", "an", "answer", "to", "Stack", "Overflow!"]
if you will use some sets or words as a separate element in array so you should use \
lets take an example
%w(Thanks for contributing an answer to Stack\ Overflow!)
output would be
=> ["Thanks", "for", "contributing", "an", "answer", "to", "Stack Overflow!"]
Here ruby interpreter split the paragraph from spaces within the input. If you give \ after end of word so it merge next word with the that word and push as an string type element in array.
If can use like below
%w[2 4 5 6]
if you will use
%w("abc" "def")
then output would be
=> ["\"abc\"", "\"def\""]

%w(abc def xyz) is a shortcut for ["abc", "def","xyz"]. Meaning it's a notation to write an array of strings separated by spaces instead of commas and without quotes around them.

Does anyone have an efficient R3 function that mimics the behaviour of find/any in R2?

Rebol2 has an /ANY refinement on the FIND function that can do wildcard searches:
>> find/any "here is a string" "s?r"
== "string"
I use this extensively in tight loops that need to perform well. But the refinement was removed in Rebol3.
What's the most efficient way of doing this in Rebol3? (I'm guessing a parse solution of some sort.)

Here's a stab at handling the "*" case:
like: funct [
series [series!]
search [series!]
][
rule: copy []
remove-each s b: parse/all search "*" [empty? s]
foreach s b [
append rule reduce ['to s]
]
append rule [to end]
all [
parse series rule
find series first b
]
]
used as follows:
>> like "abcde" "b*d"
== "bcde"

I had edited your question for "clarity" and changed it to say 'was removed'. That made it sound like it was a deliberate decision. Yet it actually turns out it may just not have been implemented.
BUT if anyone asks me, I don't think it should be in the box...and not just because it's a lousy use of the word "ALL". Here's why:
You're looking for patterns in strings...so if you're constrained to using a string to specify that pattern you get into "meta" problems. Let's say I want to extract the word *Rebol* or ?Red?, now there has to be escaping and things get ugly all over again. Back to RegEx. :-/
So what you might actually want isn't a STRING! pattern like s?r but a BLOCK! pattern like ["s" ? "r"]. This would permit constructs like ["?" ? "?"] or [{?} ? {?}]. That's better than rehashing the string hackery that every other language uses.
And that's what PARSE does, albeit in a slightly-less-declarative way. It also uses words instead of symbols, as Rebol likes to do. [{?} skip {?}] is a match rule where skip is an instruction that moves the parse position past any single element of the parse series between the question marks. It could also do so if it were parsing a block as input, and would match [{?} 12-Dec-2012 {?}].
I don't know entirely what the behavior of /ALL would-or-should be with something like "ab??cd e?*f"... if it provided alternate pattern logic or what. I'm assuming the Rebol2 implementation is brief? So likely it only matches one pattern.
To set a baseline, here's a possibly-lame PARSE solution for the s?r intent:
>> parse "here is a string" [
some [ ; match rule repeatedly
to "s" ; advance to *before* "s"
pos: ; save position as potential match
skip ; now skip the "s"
[ ; [sub-rule]
skip ; ignore any single character (the "?")
"r" ; match the "r", and if we do...
return pos ; return the position we saved
| ; | (otherwise)
none ; no-op, keep trying to match
]
]
fail ; have PARSE return NONE
]
== "string"
If you wanted it to be s*r you would change the skip "r" return pos into a to "r" return pos.
On an efficiency note, I'll mention that it is indeed the case that characters are matched against characters faster than strings. So to #"s" and #"r" to end make a measurable difference in the speed when parsing strings in general. Beyond that, I'm sure others can do better.
The rule is certainly longer than "s?r". But it's not that long when comments are taken out:
[some [to #"s" pos: skip [skip #"r" return pos | none]] fail]
(Note: It does leak pos: as written. Is there a USE in PARSE, implemented or planned?)
Yet a nice thing about it is that it offers hook points at all the moments of decision, and without the escaping defects a naive string solution has. (I'm tempted to give my usual "Bad LEGO alligator vs. Good LEGO alligator" speech.)
But if you don't want to code in PARSE directly, it seems the real answer would be some kind of "Glob Expression"-to-PARSE compiler. It might be the best interpretation of glob Rebol would have, because you could do a one-off:
>> parse "here is a string" glob "s?r"
== "string"
Or if you are going to be doing the match often, cache the compiled expression. Also, let's imagine our block form uses words for literacy:
s?r-rule: glob ["s" one "r"]
pos-1: parse "here is a string" s?r-rule
pos-2: parse "reuse compiled RegEx string" s?r-rule
It might be interesting to see such a compiler for regex as well. These also might accept not only string input but also block input, so that both "s.r" and ["s" . "r"] were legal...and if you used the block form you wouldn't need escaping and could write ["." . "."] to match ".A."
Fairly interesting things would be possible. Given that in RegEx:
(abc|def)=\g{1}
matches abc=abc or def=def
but not abc=def or def=abc
Rebol could be modified to take either the string form or compile into a PARSE rule with a form like:
regex [("abc" | "def") "=" (1)]
Then you get a dialect variation that doesn't need escaping. Designing and writing such compilers is left as an exercise for the reader. :-)

I've broken this into two functions: one that creates a rule to match the given search value, and the other to perform the search. Separating the two allows you to reuse the same generated parse block where one search value is applied over multiple iterations:
expand-wildcards: use [literal][
literal: complement charset "*?"
func [
{Creates a PARSE rule matching VALUE expanding * (any characters) and ? (any one character)}
value [any-string!] "Value to expand"
/local part
][
collect [
parse value [
; empty search string FAIL
end (keep [return (none)])
|
; only wildcard return HEAD
some #"*" end (keep [to end])
|
; everything else...
some [
; single char matches
#"?" (keep 'skip)
|
; textual match
copy part some literal (keep part)
|
; indicates the use of THRU for the next string
some #"*"
; but first we're going to match single chars
any [#"?" (keep 'skip)]
; it's optional in case there's a "*?*" sequence
; in which case, we're going to ignore the first "*"
opt [
copy part some literal (
keep 'thru keep part
)
]
]
]
]
]
]
like: func [
{Finds a value in a series and returns the series at the start of it.}
series [any-string!] "Series to search"
value [any-string! block!] "Value to find"
/local skips result
][
; shortens the search a little where the search starts with a regular char
skips: switch/default first value [
#[none] #"*" #"?" ['skip]
][
reduce ['skip 'to first value]
]
any [
block? value
value: expand-wildcards value
]
parse series [
some [
; we have our match
result: value
; and return it
return (result)
|
; step through the string until we get a match
skips
]
; at the end of the string, no matches
fail
]
]
Splitting the function also gives you a base to optimize the two different concerns: finding the start and matching the value.
I went with PARSE as even though *? are seemingly simple rules, there is nothing quite as expressive and quick as PARSE to effectively implementing such a search.
It might yet as per #HostileFork to consider a dialect instead of strings with wildcards—indeed to the point where Regex is replaced by a compile-to-parse dialect, but is perhaps beyond the scope of the question.

Word signature in factor

i try to iterate over a an array which contains weather data. That works fine already and I also can load the datas from the array which are important for me. Therefore I wrote a helping word which looks like this:
: get-value ( hsh str -- str ) swap at* drop ;
[ "main" get-value "temp" get-value ] each 9 [ + ] times
This code pushes the temperature values from the array on the stack and builds the sum. "main" and "temp" are the key values of the arrays.
I execute it with this command: get-weather-list generates the array
"Vienna" get-weather-list [ "main" get-value "temp" get-value ] each 9 [ + ] times
The result is a number on the stack. Now I want to split this call into one or two words. For example:
: get-weather-information ( city -- str )
get-weather-list
[ "main" get-value "temp" get-value ] each 9 [ + ] times ;
The problem is that I don't really understand the word's signature. I always get "The input quotation to “each” doesn't match its expected effect". I tried a lot but can't find a solution to fix this problem. May anyone have an idea? I am grateful for any help :)
Cheers
Stefan

This is a very old question by now, but it still may be useful to someone.
First, about each: the stack effect of the quotation is (... x -- ...).
That means it consumes an input, and outputs nothing. Your quotation worked on the interpreter because it lets you get away with "wrong" code. But for calling each from a defined word, your quotation can't output anything.
So each is not what you want. If you try to push a variable amount of values to the stack, you'll have the same kind of trouble again. Sequence words all output a fixed amount of values.
What you want to do is one of two things:
Make a new sequence with just the values you want, and then call sum on it.
Use something like reduce, to accumulate the sum as you process your list.
For example, with reduce:
get-weather-list 0 [ "main" get-value "temp" get-value + ] reduce ;

How to use PARSE with COLLECT/KEEP to get a file path (with slash) and filename?

I'm trying to take a FILE! string type and turn it into two parts. One that is all the part up to the final slash in the directories, and one that is just the name of the file itself.
I tried this:
>> parse %dir/other-dir/file.ext [collect [keep thru [any thru %/] keep to end]]
But that just gives me back the full name [%dir/other-dir/file.ext] I would have liked to get the two element block [%dir/other-dir/ file.ext].
(I'd also like [none file.ext] if I had given an input like just %file.ext)

You've got one 'thru too much.
red>> parse %abc/file.ext [collect[keep [some [thru #"/"] | keep (none) ] keep to end]]
== [%abc/ %file.ext]
red>> parse %/abc/file.ext [collect[keep [some [thru #"/"] | keep (none) ] keep to end]]
== [%/abc/ %file.ext]
red>> parse %/abc/def/file.ext [collect[keep [some [thru #"/"] | keep (none) ] keep to end]]
== [%/abc/def/ %file.ext]
I am using 'some, so that the rule fails, if there's no slash in the input. Then using "| keep (none)" keeps the 'none you want.
"keep (something)" keeps the return value of running "something" through the 'do dialect.
red>> parse %file.ext [collect[keep [some [thru #"/"] | keep (none) ] keep to end]]
== [none %file.ext]
Without it, you would only get the file part.
red>> parse %file.ext [collect[keep [any [thru #"/"]] keep to end]]
== [%file.ext]
red>> parse %/abc/def/file.ext [collect[keep [any [thru #"/"]] keep to end]]
== [%/abc/def/ %file.ext]

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

How do I use collect keep in parse, to get embedded blocks? - parsing

Related

Transforming a string in Ruby w/ `gsub` using an array of regexps

Why do you use %w[] in rails?

Does anyone have an efficient R3 function that mimics the behaviour of find/any in R2?

Word signature in factor

How to use PARSE with COLLECT/KEEP to get a file path (with slash) and filename?

Categories

Resources