GREP: Find lines more than 12 characters excluding spaces - grep

I am using a GREP search in Sublime Text 3. I want to find all lines that are more than 12 characters, excluding spaces.
Example
Mac and Cheese
Peanut Butter and Jelly Sandwich
In the above example, Mac and Cheese would not be found, because it's exactly 12 characters excluding spaces.
How would I do this?
I can use the following to find all lines that are more than 12 characters. But I am not sure how to exclude spaces:
(?<=.{13}).+

The pattern (?<=.{13}).+ will assert what is on the left is 13 characters and the dot will also match a space. Then it will match any char except a whitespace 1+ times.
You could match horizontal whitespace chars and repeat 13 or more times matching a non whitespace char for example \S (or specify what you would allow to match) followed by 0+ horizontal whitespace chars.
^\h*(?:\S\h*){13,}$
^ Start of string
\h* Match 0+ times a horizontal whitespace char
(?: Non capturing group
\S\h* Match non whitespace char, then 0+ horizontal whitespace chars
){13,} Close group and repeat 13+ times
$ End of string
Regex demo

Related

Regular wrong regular expression, not validating

please i want to validate the inputs from a user, the format for the inputs would be: 3 uppercase characters, 3 integer numbers, an optional space, a -, an optional space, either a 'LAB or ((EN or ENLH) with 1 interger number ranging from a [1-9]).
The regex i wrote is
/\D{3}\d{3}\s?-\s?(LAB|(EN(LH)?\d{1}))/
am finding it difficult to stop inputs after the LAB so that when EEE333 - LAB1 is inputed it becomes invalid.
If you are asking how to prevent LAB1 at the end, use an end of line anchor $ in your regex test:
/\D{3}\d{3}\s?-\s?(LAB|(EN(LH)?\d{1}))$/
If you are trying to require exactly one digit at the end of the acceptable strings, move the single digit match outside of the optional groups:
/\D{3}\d{3}\s?-\s?(LAB|(EN(LH)?))\d{1}$/
I have wrote for you the following regular expression:
[A-Z]{3}[0-9]{3}\s?-\s?(?:LAB|(?:EN|LH))[1-9]{1}
The regex works a follows:
[A-Z]{3}
MATCH EXACTLY THREE UPPERCASE CHARACTERS RANGING FROM A TO Z
[0-9]{3}
MATCH EXACTLY THREE NUMBERS RANGING FROM 0 TO 9
\s?\-\s?
MATCH a space (optional) or a '-' (required) or a space (optional)
(?:LAB|(?:EN|LH))
MATCH 'LAB' OR ('EN' OR 'LH')?: omits capturing LAB OR EN OR LH
[1-9]{1}
MATCH EXACTLY ONE NUMBERS RANGING FROM 1 TO 9
You could place your regex between word boundaries \b.
You start your regex with \D which is any character that is not a digit. That would for example also match $%^. You could use [A-Z].
You use \d{1} which is a shorhand for [0-9], but you want to match a digit between 1 and 9 [1-9]. You could also omit the {1}.
Maybe this updated will work for you?
\b[A-Z]{3}\d{3} ?- ?(?:LAB|(?:EN(?:LH)?[1-9]))\b
Explanation
A word boundary \b
Match 3 uppercase characters [A-Z]{3}
Match 3 digits \d{3}
Match an optional whitespace, a hyphen and another optional whitespace ?- ?
A non capturing group which for example matches LAB or EN EN1 or ENLH or ENLH9 (?:EN(?:LH)?[1-9]))
A word boundary \b

Combine these regex expressions

I have two regular expressions: ^(\\p{L}|[0-9]|_)+$ and #[^[:punct:][:space:]]+ (the first is used in Java, the second on iOS). I want to combine these into one expression, to match either one or the other in iOS.
The first one is for a username so I also need to add a # character to the start of that one. What would that look like?
The ^(\\p{L}|[0-9]|_)+$ pattern in Java matches the same way as in ICU library used in iOS (they are very similar): a whole string consisting of 1 or more Unicode letters, ASCII digits or _. It is poorly written as the alternation group is quantified and that is much less efficient than a character class based solution, ^[\\p{L}0-9_]+$.
The #[^[:punct:][:space:]]+ pattern matches a # followed with 1 or more chars other than punctuation/symbols and whitespace chars (that is, 1 or more letters or digits, or alphanumeric chars).
What you seek can be writtern as
#[\\p{L}0-9_]+|[^[:punct:][:space:]]+
or
#[\\p{L}0-9_]+|#[[:alnum:]]+
or if you want to limit to ASCII digits and not match Unicode digits:
#[\\p{L}0-9_]+|#[\\p{L}0-9]+
It matches
# - a # symbol
[\\p{L}0-9_]+ - 1 or more Unicode letters, ASCII diigts, _
| - or
# - a # char
[[:alnum:]]+ - 1 or more letters or digits.
[^[:punct:][:space:]]+ - any 1+ chars other than punctuation/symbols and whitespace.
Basically, all these expressions match strings like this.
If you want to match #SomeThing_123 in full, just use [##]\\w+, a # or # and then 1 or more letters, digits or _, or to only allow ASCII digits, [##][\\p{L}0-9_]+.
A word boundary may be required at the end of the pattern, [##][\\p{L}0-9_]+\\b.

How do I match text with a regular expression ignoring punctuation and line breaks [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 4 years ago.
I have an app where I need to find the position of a list of words in a passage of text. A regex is blatantly the way to do this but the issue I have is that I may have all kinds of punctuation or new lines between words. How do I do "find these words possibly separated but some non-alphanumeric characters"?
UPDATE:
An example would be that I need to find the range of:
shouted help these regular expressions are horrible so
in
The developer shouted "help", these regular expressions are horrible!
So, please help me :(
Description
\b(?:[a-z](?:[a-z\n\r.:;,?!-]*[a-z])?)\b
** Click for bigger image
This regular expression will do the following:
Requires all words to start and end with a-z, or be a single letter long
Allows words to contain new line characters, or common punctuation like .:;,?!-
Words are not allowed to contain spaces
Example
Live Demo
https://regex101.com/r/bK4oO8/1
Sample text
How do I match text with a regular expres
sion ignoring punctuation and line breaks?
How do I do "find these words pos-
sibly separated but some non-alphanumeric characters"?
Sample Matches
MATCH 1
0. [0-3] `How`
MATCH 2
0. [4-6] `do`
MATCH 3
0. [7-8] `I`
MATCH 4
0. [9-14] `match`
MATCH 5
0. [15-19] `text`
MATCH 6
0. [20-24] `with`
MATCH 7
0. [25-26] `a`
MATCH 8
0. [27-34] `regular`
MATCH 9
0. [35-46] `expres
sion`
MATCH 10
0. [47-55] `ignoring`
MATCH 11
0. [56-67] `punctuation`
MATCH 12
0. [68-71] `and`
MATCH 13
0. [72-76] `line`
MATCH 14
0. [77-88] `breaks?
How`
MATCH 15
0. [89-91] `do`
MATCH 16
0. [92-93] `I`
MATCH 17
0. [94-96] `do`
MATCH 18
0. [98-102] `find`
MATCH 19
0. [103-108] `these`
MATCH 20
0. [109-114] `words`
MATCH 21
0. [115-125] `pos-
sibly`
MATCH 22
0. [126-135] `separated`
MATCH 23
0. [136-139] `but`
MATCH 24
0. [140-144] `some`
MATCH 25
0. [145-161] `non-alphanumeric`
MATCH 26
0. [162-172] `characters`
Explanation
NODE EXPLANATION
----------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
[a-z] any character of: 'a' to 'z'
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
[a-z\n\r.:;,?!- any character of: 'a' to 'z', '\n'
]* (newline), '\r' (carriage return),
'.', ':', ';', ',', '?', '!', '-' (0
or more times (matching the most
amount possible))
----------------------------------------------------------------------
[a-z] any character of: 'a' to 'z'
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
----------------------------------------------------------------------
Extra Credit
If you also want to eliminate matches like #14 above, where you have a ? which is followed by a new line character. When in this configuration the ? should not be considered to be part of the word, where as a - followed by a new line is really a hyphen. Then you should consider this
\b(?:[a-z](?:(?:[a-z-]+|[.:;,?!-]+(?![\n\r])|[\n\r]+)*[a-z])?)\b
Live Demo: https://regex101.com/r/bK4oO8/2
I figured it out:
let pattern = String(format: "(\\b%#\\b)",words.joinWithSeparator("[^a-zA-Z\\d\\s:]?[ ]"))
the '\b' gives word boundaries then it matches words separated but an optional punctuation character and then a space. I will probably have to add a few bits for double punctuation but it works for now.

Ultraedit regex to remove all words which contains number

I am trying to make a Ultraedit regex which allows me to remove all words of a txt file containing a number.
For example:
test
test2
t2est
te2st
and...
get only
test
A case-insensitive search with Perl regular expression search string \<[a-z]+\d\w*\> finds entire words containing at least 1 digit.
\< ... beginning of a word. \b for any word boundary could be also used.
[a-z]+ ... any letter 1 or more times. You can put additional characters into the square brackets like ÄÖÜäöüß also used in language of text file.
\d ... any digit, i.e. 0-9.
\w* ... any word character 0 or more times. Any word character means all word characters according to Unicode table which includes language dependent word characters, all digits and the underscore.
\> ... end of a word. \b for any word boundary could be also used.
A case-insensitive search with UltraEdit regular expression search string [a-z]+[0-9][a-z0-9_]++ finds also entire words containing at least 1 digit if additionally the find option Match whole word is also checked.
[a-z]+ ... any letter 1 or more times. You can put additional characters into the square brackets used in language of text file.
[0-9] ... any digit.
[a-z0-9_]++ ... any letter, digit or underscore 0 or more times.
The UltraEdit regexp search string [a-z]+[0-9][a-z0-9_]++ in Unix/Perl syntax would be [a-z]+[0-9][a-z0-9_]* which could be also used with find option Match whole word checked instead of the Perl regexp search.

Difference between \b and \s in Regular Expression

I was learning regular expression in iOS, saw this tutorial:http://www.raywenderlich.com/30288/nsregularexpression-tutorial-and-cheat-sheet
It reads like this for \b:
\b matches word boundary characters such as spaces and punctuation. to\b will match the "to" in "to the moon" and "to!", but it will not match "tomorrow". \b is handy for "whole word" type matching.
and \s:
\s matches whitespace characters such as spaces, tabs, and newlines. hello\s will match "hello " in "Well, hello there!".
I have two questions on this:
1) what is the difference between \s and \b? when to use which?
2) \b is handy for "whole word" type matching -> Don't understand the meaning..
Need some guidance on these two.
\b Boundary characters
\b matches the boundary itself but not the boundary character (like a comma or period). It has no length in itself but can be used to find for example e in the end of a word.
For example in the sentence: "Hello there, this is one test. Testing"
The regex e\b will match an e if it's at the end of the word (followed by a word boundary). Notice in the image below that the e in "test" and "Testing" didn't match since the "e" is not followed by a boundary.
\s Whitespace
\s on the other hand matches the actual white space characters (like spaces and tabs). In the same sentence it will match all the spaces between the words.
Edit
Since \b doesn't make much sense alone I showed to how to it as e\b (above). The OP asked (in a comment) about what e\s would match compared to e\b to better explain the difference between \b and \s.
In the same string there is only one match for e\s while there was two matches for e\b since the comma is not a whitespace. Note that the e\s match (image 3) includes the white space where as the e\b match doesn't (image 1).
\b is matching a word boundary. That is a zero width assertion, means it is not matching a character, it is matching a position, where a certain condition is true.
\b is related to \w. \w is defining "word characters", means letters, digits and underscores. So \b is now matching on a change from a word character to a non-word character, or the other way round. Means it matches the start and end of a word, but not the character before or after the word.
\s is a predefined character class that is matching any whitespace character.
See and try out what \bFoo\b matches here on Regexr
See and try out what \sFoo\s matches here on Regexr
\b is zero-width. That is, it doesn't actually match any character. Meanwhile, \s does match a character. This is an important distinction for capturing and more complicated regular expressions.
For example, say you're trying to match numbers that begin with multiple zeros, like 007 or 000101101. You might try:
0+\d*
But see, that would also match 1007 and 101000101101! So then, you might try:
\s0+\d*
But see how that wouldn't match a 007 at the beginning of the string (because there's no space character)? Using \b allows you to get the "whole word (or number)":
\b0+\d*
\b matches any character that is not a letter or number without including itself in the match.
\s matches only white space.
For example:
\b would match any of these: "!?,.##$%^&*()_+ ".
$text = "Hello, Yo! moo .";
$regex = "~o\b~";
^---Will match all three o's.
$text = "Hello, Yo! moo .";
$regex = "~o\s~";
^---Will only match the 'o' in 'moo'.

Resources