I'm trying to find long quotes in the text that I'm editing so that I can apply a different style to them. I've tried this GREP:
~[.{230}(?!.~])
What I need is for the GREP to find any 230 characters preceded by a left/opening quote mark, not including any 230-character sequence including a character followed by a right/clsoing quote mark. This should then eliminate quotes of less than 230 characters from the search. My GREP finds the correct length sequence but doesn't exclude those sequences which include a right quote mark.
So I want to find this, which my GREP does:
But not this, which my GREP also finds:
Because it has a closing quote in it and is therefore what I'm classing as a short quote.
Any ideas? TIA
You can match an opening ‘ followed by 230 or more occurrences of any character except an opening or closing quotation mark.
To not match the closing quotation mark, you can assert it using a positive lookahead.
‘[^‘’]{230,}(?=’)
‘ Match ‘
[^‘’]{230,} Repeat 230+ times any char except ‘ or ’ using a negated character class
(?=’) Positive lookahead, assert ’ directly to the right
See a regex demo.
Thanks #Thefourthbird.
So what I needed was:
‘[^’]{230,}
to search for an opening apostrophe ‘ followed by anything but a closing apostrophe [^’] of 230 characters or more {230,}
Strangely, if you use InDesign's code for left ~[ and right ]~ apostrophe it doesn't work!
Related
The txt file is :
bar
quux
kabe
Ass
sBo
CcdD
FGH
I would like to grep the words with only one capital letter in this example, but when I use "grep [A-Z]", it shows me all words with capital letters.
Could anyone find the "grep" solution here? My expected output is
Ass
sBo
grep '\<[a-z]*[A-Z][a-z]*\>' my.txt
will match lines in the ASCII text file my.txt if they contain at least one word consisting entirely of ASCII letters, exactly one of which is upper case.
You seem to have a text file with each word on its own line.
You may use
grep '^[[:lower:]]*[[:upper:]][[:lower:]]*$' file
See the grep online demo.
The ^ matches the start of string (here, line since grep operates on a line by lin basis by default), then [[:lower:]]* matches 0 or more lowercase letters, then an [[:upper:]] pattern matches any uppercase letter, and then [[:lower:]]* matches 0+ lowercase letters and $ asserts the position at the end of string.
If you need to match a whole line with exactly one uppercase letter you may use
grep '^[^[:upper:]]*[[:upper:]][^[:upper:]]*$' file
The only difference from the pattern above is the [^[:upper:]] bracket expression that matches any char but an uppercase letter. See another grep online demo.
To extract words with a single capital letter inside them you may use word boundaries, as shown in mathguy's answer. With GNU grep, you may also use
grep -o '\b[^[:upper:]]*[[:upper:]][^[:upper:]]*\b' file
grep -o '\b[[:lower:]]*[[:upper:]][[:lower:]]*\b' file
See yet another grep online demo.
I have a word list of over 10,000 words, but this is just a sample:
'Tis midnight
sev'n words spoke
th'Immortal night
A wonder-working pow'r
Wondrous deliv'rer to me
I want to delete all words that contain apostrophes so the list should look like this:
midnight
words spoke
night
A wonder-working
Wondrous to me
How can I do this using Sublime Text so it finds apostrophes and smart apostrophes (’)?
You could use a character class['’] to match both variations of the apostrophes and match zero or more times a non-whitespace character \S* before or after the matched apostrophe followed by optional horizontal white-space chars.
\S*['’]\S*\h*
Regex demo
A slightly more optimized version without preventing the first \S* causing backtracking could be using a negated character class [^\s'’]* to match until the first apostrophe.
[^\s'’]*['’]\S*\h*
Regex demo
How can I use the BBedit grep option to replace LaTeX commands like
\textcolor{blue}{Some text}
by the contents of the second set of braces, so
Some text
?
The BBEdit Grep Tutorial gives a lot of information and good examples on using the grep option in BBEdit. What you are trying to achieve is actually a variation of one of the examples. The solution is to enter the following:
Find: \\textcolor\{blue\}\{([^\}]*)\}
Replace: \1
The relevant part is the "Find" section. The first part: \\textcolor\{blue\}\{ basically searches for the content \textcolor{blue}{. You need the \s to escape special characters.
Next, we have the cryptic sequence ([^\}]*): The (...) saves everything inside the parentheses into the variable \1, which you can use in the "Replace" section to insert the content. The [^\}]* consists of ^\} which means match all characters which are not ^ a closing brace \}. With [...]* we say, match any number of "not brace" characters. Overall, this expression makes the grep match all characters which are not closing braces, and saves them into \1.
Finally, the expression ends with a \}, i.e. a closing brace, which is the end of what we want to find.
The "Replace" only contains \1, which is everything inside the parentheses (...) in the "Find" field.
On The Lua Interpreter
>print("This is a string
>>spread over multiline")
stdin:1: unfinished string near '"This is a'
Since we know on the Lua interpreter we can finish a statement over mulitline
For eg
>a=2
>a=a+
>>1
This works perfectly
Again:
>print([[This is a multiline
>>string]])
This is a multiline
string
This works fine!! then why display error in the first print() statement??
Read the fine Reference Manual:
3.1 – Lexical Conventions
[…]
A short literal string can be delimited by matching single or double
quotes, and can contain the following C-like escape sequences: '\a' (bell),
'\b' (backspace), '\f' (form feed), '\n' (newline), '\r' (carriage
return), '\t' (horizontal tab), '\v' (vertical tab), '\\' (backslash),
'\"' (quotation mark [double quote]), and '\'' (apostrophe [single
quote]). A backslash followed by a line break results in a newline in the
string. The escape sequence '\z' skips the following span of white-space
characters, including line breaks; it is particularly useful to break and
indent a long literal string into multiple lines without adding the newlines
and spaces into the string contents. A short literal string cannot contain
unescaped line breaks nor escapes not forming a valid escape sequence.
[…]
Literal strings can also be defined using a long format enclosed by long
brackets. We define an opening long bracket of level n as an opening
square bracket followed by n equal signs followed by another opening
square bracket. So, an opening long bracket of level 0 is written as [[,
an opening long bracket of level 1 is written as [=[, and so on. A
closing long bracket is defined similarly; for instance, a closing long
bracket of level 4 is written as ]====]. A long literal starts with an
opening long bracket of any level and ends at the first closing long
bracket of the same level. It can contain any text except a closing
bracket of the same level. Literals in this bracketed form can run for
several lines, do not interpret any escape sequences, and ignore long
brackets of any other level. Any kind of end-of-line sequence (carriage
return, newline, carriage return followed by newline, or newline followed
by carriage return) is converted to a simple newline.
I have following regex handy to match all the lines containing console.log() or alert() function in any javascript file opened in the editor supporting PCRE.
^.*\b(console\.log|alert)\b.*$
But I encounter many files containing window.alert() lines for alerting important messages, I don't want to remove/replace them.
So the question how to regex-match (single line regex without need to run frequently) all the lines containing console.log() and alert() but not containing word window. Also how to escape round brackets(parenthesis) which are unescapable by \, to make them part of string literal ?
I tried following regex but in vain:
^.*\b(console\.log|alert)((?!window).)*\b.*$
You should use a negative lookhead, like this:
^(?!.*window\.).*\b(console\.log|alert)\b.*$
The negative lookhead will assert that it is impossible to match if the string window. is present.
Regex Demo
As for the parenthesis, you can escape them with backslashes, but because you have a word boundary character, it will not match if you put the escaped parenthesis, because they are not word characters.
The metacharacter \b is an anchor like the caret and the dollar sign.
It matches at a position that is called a "word boundary". This match
is zero-length.
There are three different positions that qualify as word boundaries:
Before the first character in the string, if the first character is a
word character.
After the last character in the string, if the last
character is a word character.
Between two characters in the string,
where one is a word character and the other is not a word character.