I can't figure out what does this regex match:
A: "\\/\\/c\\/(\\d*)"
B: "\\/\\/(\\d*)"
I suppose they are matching some kind of number sequence since \d matches any digit but I'd like to know an example of a string that would be a match for this regex.
The pattern syntax is that specified by ICU. Expressions are created with NSRegularExpression in an iOS app and are correct.
The first matches //c/ + 0 or more digits. The second matches // + 0 or more digits. In both the digits are captured.
An example of a match for A) is //c/123
An example of a match for B) is //12345
When I use Cygwin which emulates Bash on Windows, I sometimes run into situations where I have to escape my escape characters which is what I think is making this expression look so weird. For instance, when I use sed to look for a single '\' I sometimes have to write it as '\\\\'. (Funny, StackOverflow proved my point. If you write 4 backslashes in the comment, it only shows two. So if you process it again, they might all disappear depending on your situation).
Considering this, it might be helpful to think of pairs of backslashes as representing only one if you're coming from a similar situation. My guess would be you are. Because of this I would say Erik Duymelinck is probably spot on. This will capture a sequence of digits that may or may not follow a couple slashes and a c:
//c/000
//00000
This regex matches an odd sequence of characters, which, at first glance, almost seem like a regex, since \d is a digit, and followed by an asterisk (\d*) would mean zero-or-more digits. But it's not a digit, because the escape-slash is escaped.
\\/\\/c\\/(\\d*)
So, for instance, this one matches the following text:
\/\/c\/\
\/\/c\/\d
\/\/c\/\dd
\/\/c\/\ddd
\/\/c\/\dddd
\/\/c\/\ddddd
\/\/c\/\dddddd
...
This one is almost the same
\\/\\/(\\d*)
except you just delete the c\/ from the above results:
\/\/\
\/\/\d
\/\/\dd
\/\/\ddd
\/\/\dddd
\/\/\ddddd
\/\/\dddddd
...
In both cases, the final \ and optional d is [capture group][1] one.
My first impression was that these regexes were intended for escaping in Java strings, meaning they would be completely invalid. If the were escaped for Java strings, such as
Pattern p = Pattern.compile("\\/\\/c\\/(\\d*)");
It would be invalid, because after un-escaping, it would result in this invalid regex:
\/\/c\/(\d*)
The single escape-slashes (\) are invalid. But the \d is valid, as it would mean any digit.
But again, I don't think they're invalid, and they're not escaped for a Java string. They're just odd.
Related
If I have a search: (\d\d):(\d\d) and I want to add an extra 0 to the numbers that I find (ie, 12:30 would become 120:130), how do I prevent the 0 being interpreted as \10 and \20:
\10:\20
I tried escaping it with \ but that just made more backreferences. Is there another way to escape in grep?
In your original post, you didn't mention that you're using these backreferences in the replacement pattern, not the search pattern. You also didn't mention that you're using BBEdit. Solving your problem requires both of those facts.
From page 209 of the BBEdit manual:
\NNN+
If more than two decimal digits follow the backslash, only the first two are considered part of the backreference. Thus, “\111” would be interpreted as the 11th backreference, followed by a literal “1”. You may use a leading zero; for example, if in your replacement pattern you want the first backreference followed by a literal “1”, you can use “\011”. (If you use “\11”, you will get the 11th backreference, even if it is empty.)
Therefore you should try this replacement pattern:
\010:\020
I've been looking for a good way to see if a string of items are all numbers, and thought there might be a way of specifying a range from 0 to 9 and seeing if they're included in the string, but all that I've looked up online has really confused me.
def validate_pin(pin)
(pin.length == 4 || pin.length == 6) && pin.count("0-9") == pin.length
end
The code above is someone else's work and I've been trying to identify how it works. It's a pin checker - takes in a set of characters and ensures the string is either 4 or 6 digits and all numbers - but how does the range work?
When I did this problem I tried to use to_a? Integer and a bunch of other things including ranges such as (0..9) and ("0..9) and ("0".."9") to validate a character is an integer. When I saw ("0-9) it confused the heck out of me, and half an hour of googling and youtube has only left me with regex tutorials (which I'm interested in, but currently just trying to get the basics down)
So to sum this up, my goal is to understand a more semantic/concise way to identify if a character is an integer. Whatever is the simplest way. All and any feedback is welcome. I am a new rubyist and trying to get down my fundamentals. Thank You.
Regex really is the right way to do this. It's specifically for testing patterns in strings. This is how you'd test "do all characters in this string fall in the range of characters 0-9?":
pin.match(/\A[0-9]+\z/)
This regex says "Does this string start and end with at least one of the characters 0-9, with nothing else in between?" - the \A and \z are start-of-string and end-of-string matchers, and the [0-9]+ matches any one or more of any character in that range.
You could even do your entire check in one line of regex:
pin.match(/\A([0-9]{4}|[0-9]{6})\z/)
Which says "Does this string consist of the characters 0-9 repeated exactly 4 times, or the characters 0-9, repeated exactly 6 times?"
Ruby's String#count method does something similar to this, though it just counts the number of occurrences of the characters passed, and it uses something similar to regex ranges to allow you to specify character ranges.
The sequence c1-c2 means all characters between c1 and c2.
Thus, it expands the parameter "0-9" into the list of characters "0123456789", and then it tests how many of the characters in the string match that list of characters.
This will work to verify that a certain number of numbers exist in the string, and the length checks let you implicitly test that no other characters exist in the string. However, regexes let you assert that directly, by ensuring that the whole string matches a given pattern, including length constraints.
Count everything non-digit in pin and check if this count is zero:
pin.count("^0-9").zero?
Since you seem to be looking for answers outside regex and since Chris already spelled out how the count method was being implemented in the example above, I'll try to add one more idea for testing whether a string is an Integer or not:
pin.to_i.to_s == pin
What we're doing is converting the string to an integer, converting that result back to a string, and then testing to see if anything changed during the process. If the result is =>true, then you know nothing changed during the conversion to an integer and therefore the string is only an Integer.
EDIT:
The example above only works if the entire string is an Integer and won’t properly deal with leading zeros. If you want to check to make sure each and every character is an Integer then do something like this instead:
pin.prepend(“1”).to_i.to_s(1..-1) == pin
Part of the question seems to be exactly HOW the following portion of code is doing its job:
pin.count("0-9")
This piece of the code is simply returning a count of how many instances of the numbers 0 through 9 exist in the string. That's only one piece of the relevant section of code though. You need to look at the rest of the line to make sense of it:
pin.count("0-9") == pin.length
The first part counts how many instances then the second part compares that to the length of the string. If they are equal (==) then that means every character in the string is an Integer.
Sometimes negation can be used to advantage:
!pin.match?(/\D/) && [4,6].include?(pin.length)
pin.match?(/\D/) returns true if the string contains a character other than a digit (matching /\D/), in which case it it would be negated to false.
One advantage of using negation here is that if the string contains a character other than a digit pin.match?(/\D/) would return true as soon as a non-digit is found, as opposed to methods that examine all the characters in the string.
I have been coding a program in Lua that automatically formats IRC logs from a roleplay. In the roleplay logs there is a specific guideline for "Out of character" conversation, which we use double parentheses for. For example: ((<Things unrelated to roleplay go here>)). I have been trying to have my program remove text between double brackets (and including both brackets). The code is:
ofile = io.open("Output.txt", "w")
rfile = io.open("Input.txt", "r")
p = rfile:read("*all")
w = string.gsub(p, "%(%(.*?%)%)", "")
ofile:write(w)
The pattern here is > "%(%(.*?%)%)" I've tried multiple variations of the pattern. All resulted in fruitless results:
1. %(%(.*?%)%) --Wouldn't do anything.
2. %(%(.*%)%) --Would remove *everything* after the first OOC message.
Then, my friend told me that prepending the brackets with percentages wouldn't work, and that I had to use backslashes to 'escape' the parentheses.
3. \(\(.*\)\) --resulted in the output file being completely empty.
4. (\(\(.*\)\)) --Same result as above.
5. (\(\(.*?\)\) --would for some reason, remove large parts of the text for no apparent reason.
6. \(\(.*?\)\) --would just remove all the text except for the last line.
The short, absolute question:
What pattern would I need to use to remove all text between double parentheses, and remove the double parentheses themselves too?
You're friend is thinking of regular expressions. Lua patterns are similar, but different. % is the correct escape character.
Your pattern should be %(%(.-%)%). The - is similar to * in that it matches any number of the preceding sequence, but while * tries to match as many characters as it can (it's greedy), - matches the least amount of characters possible (it's non-greedy). It won't go overboard and match extra double-close-parenthesis.
I have a string from which I want to extract a certain part:
Original String: /abc/d7_t/g-12/jkl/m-n3/pqr/stu/vwx
Result Desired: /abc/d7_t/g-12/jkl/
The number of characters can vary in the entire string. It has alphabets, numbers, underscore and hyphen. I want to basically cut the string after the 5th "/"
I tried a few regex, but it seems there is some mistake with the format.
If a non-regexp approach is acceptable, how about this:
s.split('/').take(n).join('/')+'/'
Where s if your string (in your case: /abc/d7_t/g-12/jkl/m-n3/pqr/stu/vwx).
def cut_after(s, n)
s.split('/').take(n).join('/')+'/'
end
Then
cut_after("/abc/d7_t/g-12/jkl/m-n3/pqr/stu/vwx", 5)
should work. Not as compact as a regexp, but some people may find it clearer.
The regexp would be: %r(/(?:[^/]+/){4}). Note that it is a good idea in this case to use the %r literal version to avoid escaping slashes. Unescaped slashes are likely the cause of your format errors.
Match any sequence of chars except '/' 4 times :-
(\/[^\/]+){4}\/
I have to validate username in my app so that it cannot contain two consecutive period symbols. I tried the following.
username.match(/(..)/)
but found out that this matches "a." and "a..". I expected to get nil as the output of the match operation for input "a.". Is my approach right ?
You need to put a \ in front of the periods, because period is a reserved character for "any character except newline".
So try:
username.match(/(\.\.)/)
Short answer
You can use something like this (see on rubular.com):
username.match(/\.{2}/)
The . is escaped by preceding with a backslash, {2} is exact repetition specifier, and the brackets are removed since capturing is not required in this case.
On metacharacters and escaping
The dot, as a pattern metacharacter, matches (almost) any character. To match a literal period, you have at least two options:
Escape the dot as \.
Match it as a character class singleton [.]
Other metacharacters that may need escaping are | (alternation), +/*/?/{/} (repetition), [/] (character class), ^/$ (anchors), (/) (grouping), and of course \ itself.
References
regular-expressions.info/Literals and metacharacters, Dot: ., Character Class: […], Anchors: ^$, Repetition: *+?{…}, Alternation: |, Optional: ?, Grouping: (…)
On finite repetition
To match two literal periods, you can use e.g. \.\. or [.][.], i.e. a simple concatenation. You can also use the repetition construct, e.g. \.{2} or [.]{2}.
The finite repetition specifier also allows you write something like x{3,5} to match at least 3 but at most 5 x.
Note that repetition has a higher precedence that concatenation, so ha{3} doesn't match "hahaha"; it matches "haaa" instead. You can use grouping like (ha){3} to match "hahaha".
On grouping
Grouping (…) captures the string it matches, which can be useful when you want to capture a match made by a subpattern, or if you want to use it as a backreference in the other parts of the pattern.
If you don't need this functionality, then a non-capturing option is (?:…). Thus something like (?:ha){3} still matches "hahaha" like before, but without creating a capturing group.
If you don't actually need the grouping aspect, then you can just leave out the brackets altogether.