I am working on an ANT pattern parser as part of a large server project.
There are some good examples of ANT patterns in the answer to this post: How do I use Nant/Ant naming patterns? however, I am still confused about some possible permutations.
One of the examples on the ANT pattern documentation here http://nant.sourceforge.net/release/0.85/help/types/fileset.html is as follows:
**/test/** Matches all files that have a test element in their path, including test as a filename.
My understanding is that ** matches one or more directories and also files under those directories. So I would expect **/test/** to match src/test/subfolder/file.txt and test/file2.txt but this statement seems to imply that it would also match a file named src/test. Is this correct even though there is a / after the test in the pattern?
Also, its not clear whether the following patterns would be valid:
folder**
folder1/folder**
**folder/file.txt
I would imagine that they would work the same as
folder*/**
folder1/folder*/**
**/*folder/file.txt
but are they allowed?
I did some testing with NAnt as per coolcfan's suggestion and answered my own question. The patterns in the question are all valid.
Based on the following files from the link in my question above:
bar.txt
src/bar.c
src/baz.c
src/test/bartest.c
The following unexpected patterns are also valid:
src** matches 2, 3 and 4
**.c matches 2, 3, and 4
**ar.* matches 1 and 2
**/bartest.c/** matches 4
src/ba?.c/** matches 2 and 3
For completeness, these are in addition to the following patterns taken from the link in my question above:
*.c matches nothing (there are no .c files in the current directory)
src/*.c matches 2 and 3
*/*.c matches 2 and 3 (because * only matches one level)
**/*.c matches 2, 3, and 4 (because ** matches any number of levels)
bar.* matches 1
**/bar.* matches 1 and 2
**/bar*.* matches 1, 2, and 4
src/ba?.c matches 2 and 3
Related
I'm trying to look for specific files in a directory using a pattern
Lets say i have the id of the user - 101
here are my files
101
101_2
101_5
10111
103
10125
101_6
I'm trying to form a regex pattern which only gives me files (101,101_2,101_5,101_6)
I'm trying the below pattern
^101_?\d+$
but it doesnt seem to pick any of the files at all. if i remove the ^.only 101_6 matches for some reason.
EDIT:
I'm using rails/ruby to look for files in the particular directory. so something like
Dir.glob(location).grep("^101_?\d+$")
do something
end
If location isn't the current folder, paths returned by glob will contain dirname and basename :
Dir.glob('./*').select{ |f| File.basename(f) =~ /\A101(_\d+)?\z/ }.each do |f|
puts f
# do something with f
end
Your question isn't particularly clear, but I'm guessing you want to match anything which is 101 followed by an optional underscore and a digit. If so, use the regex ^101_?\d$. If you want 101 followed by either a digit or an underscore and one or more digits, use ^101(_\d+|\d)$
EDIT
As the OP has mentioned in a comment, 101 should also be matched. The updated regex is ^101(?:_?\d)?$
I am attempting to match a value like 'MN+WI' at the end of a URL, for example /foos/MN+WI. The pattern [a-zA-Z][\+\,]? produces a match result of MN+WI on rubular.com, but in IRB:
s="MI+WI"
p="[a-zA-Z]{2}[\+\,]?"
r=Regexp.new(p)
r.match(s) # => #<MatchData "MI+">
The behavior in Ruby console is consistent with what I am encountering with Rails. Is there a difference between the two? How do I need to adjust my regex pattern?
$ ruby -v
ruby 2.0.0p247 (2013-06-27 revision 41674) [x86_64-darwin12.3.0]
$ rails -v
Rails 4.0.0
** edit **
Original pattern should have been [a-zA-Z]{2}[\+\,]?.
What I really need to have a route recognize any of these variations and assign it to a param:
MN (working)
mn (working)
MN+WI (not working)
MN+WI+IA (arbitrary number of 2-letter value, separated by a +)
not match single or more than 2-letter values (e.g. ABC), but keep 2-letter values (e.g. ABC+MN; keep MN)
As I said in my comment, [a-zA-Z][\+\,]? does not match MN+WI. What you are seeing on Rubular is actually two matches. The first match is MN+, and the second match WI. Rubular just highlights all the matches, so it looks like one long match but it is actually two matches. The behavior should be consistent between Rubular and your local Ruby install.
Your regexp means "2 letters followed by optional + or ,". So your string has 2 matches. Rubular highlights all matches, and it looks like the whole string is matched, but in reality there are 2 different matches = MN+ and WI
Rubular is showing the result of the repeated application of the pattern:
[a-zA-Z][\+\,]?
If you put that pattern in a capture group, you'll see each of the individual matches (see http://rubular.com/r/h5iBa5k0fr), each of which matches a single character except for N+.
Your IRB code returns a single match. Note also, though, that your IRB code is different than the above regex due to your inclusion of {2}.
I'm not quite good in regex.
With my input string LT 1 BLK 4 LAKES OF PARKWAY 5 R/P & AMEND
I'd like to match just the only part between the figure 4 and 5 in the string.
meaning that, my expected result is LAKES OF PARKWAY.
I've tried to come up with a pattern to get such result.
\d+\s+([A-z ]+)(\d+.*?)*$
but with my pattern, it only matches BLK and 5 R/P & AMEND, as group #1 and group #2 respectively. At the end of my thought pattern, I decide to use end of string matching, $.
So, when 5 R/P & AMEND got matched, the pointer should move further behind to the sub sequence part. Then, ([A-z ]+) should match LAKES OF PARKWAY.
What's wrong with my pattern? and how to get it to work?
Any advice would be very much appreciated.
Try \d+\s+(\D+)\d+\D*$
\D means 'anything that is not \d, so it won't be allowed to match, for example, between the first 1 and 4, because then the ending of the regex would be rejected at the later 5.
I was reviewing the PowerShell grammar posted here: http://www.manning.com/payette/AppCexcerpt.pdf
(I don't think it has been updated since PowerShell v1, and there are some typos. So, it's clearly not the true PowerShell Grammar, but a human-oriented document.)
In section C.2.1, it says:
<lvalueExpression> = <lvalue> [? |? <lvalue>]*
What is the meaning of the question marks? I can't tell if it means "match any character" or "match a question mark" or it's a typo.
I'm not sure what inputs this is intended to match, but maybe it's this:
$a,$b = 1, 2
in which case maybe the question mark is supposed to be a comma?
Based on its use in the preceding rule (<assignmentStatementRule> = <lvalueExpression> <AssignmentOperatorToken> <pipelineRule>), it appears that lvalueExpression in Appendix C of Windows PowerShell in Action corresponds to expression in section B.2.3 of The PowerShell Language Specification that Joey linked to. Matching it further than this is difficult, but I'll add some speculation anyway :)
The ? characters in [? |? <lvalue>]* are very likely erroneous. If it had been used to represent "the previous token is optional", then:
the [ and | tokens it was applied to should have been quoted
only [ makes sense as part of a value expression, but indexing is already covered later by the propertyOrArrayReferenceOperator rule
? is not used anywhere else in the grammar, but {0|1} is used multiple times to indicate "can appear zero or one times"
Given its similarity to [ '|' <cmdletCall> ]* at the end of the first rule in the section, it may have been a copy-and-paste error, compounded by a ‘smart quote’ round-trip encoding error. Assuming this was copied with the intent of editing later, then ?|? may have become '.' to represent multiple property accesses (but again, this is covered by the propertyOrArrayReferenceOperator rule).
Though based on the statement at the end of section C.2.1 that "[the pipeline rule] also handles parsing assignment expressions", lvalueExpression was probably intended to list all the assignable expressions besides simpleLvalue (e.g. cast-expression for [int]$x = 1, array-literal-expression for $a,$b,$c = 1,2,3), etc).
I have to admit that I always forgot the syntactical intracacies of the naming patterns for Nant (eg. those used in filesets). The double asterisk/single asterisk stuff seems to be very forgettable in my mind.
Can someone provide a definitive guide to the naming patterns?
The rules are:
a single star (*) matches zero or more characters within a path name
a double star (**) matches zero or more characters across directory levels
a question mark (?) matches exactly one character within a path name
Another way to think about it is double star (**) matches slash (/) but single star (*) does not.
Let's say you have the files:
bar.txt
src/bar.c
src/baz.c
src/test/bartest.c
Then the patterns:
*.c matches nothing (there are no .c files in the current directory)
src/*.c matches 2 and 3
*/*.c matches 2 and 3 (because * only matches one level)
**/*.c matches 2, 3, and 4 (because ** matches any number of levels)
bar.* matches 1
**/bar.* matches 1 and 2
**/bar*.* matches 1, 2, and 4
src/ba?.c matches 2 and 3
Here's a few extra pattern matches which are not so obvious from the documentation. Tested using NAnt for the example files in benzado's answer:
bar.txt
src/bar.c
src/baz.c
src/test/bartest.c
src** matches 2, 3 and 4
**.c matches 2, 3, and 4
**ar.* matches 1 and 2
**/bartest.c/** matches 4
src/ba?.c/** matches 2 and 3
Double asterisks (**) are associated with the folder-names matching, whereas single symbols asterisk (* = multi characters) as well as the question-mark (? = single character) are used to match the file-names.
Check out the Nant reference. The fileset patterns are:
'*' matches zero or more characters, e.g. *.cs
'?' matches one character, e.g. ?.cs
And '**' matches a directory tree e.g. src/**/*.cs will find all cs files in any sub-directory of src.