Is there a way to convert a quoted string to a multiline string?
Something like "This string \66 here" to [[This string \66 here]] since I would like to ignore the interpretation of escaped characters.
Lua 5.3 Reference Manual 3.1: Lexical Conventions
Literal strings can also be defined using a long format enclosed by
long brackets. We define an opening long bracket of level n as an
opening square bracket followed by n equal signs followed by another
opening square bracket. So, an opening long bracket of level 0 is
written as [[, an opening long bracket of level 1 is written as [=[,
and so on. A closing long bracket is defined similarly; for instance,
a closing long bracket of level 4 is written as ]====]. A long literal
starts with an opening long bracket of any level and ends at the first
closing long bracket of the same level. It can contain any text except
a closing bracket of the same level. Literals in this bracketed form
can run for several lines, do not interpret any escape sequences, and
ignore long brackets of any other level. Any kind of end-of-line
sequence (carriage return, newline, carriage return followed by
newline, or newline followed by carriage return) is converted to a
simple newline.
For convenience, when the opening long bracket is immediately followed
by a newline, the newline is not included in the string.
That's all you need to know about long strings.
It does not make much sense to convert a string that has been defined using quotes "some string" to a string like [[some string]] as both quotes and square brackets are not actually part of that string and the string itself is the same.
The only difference would be a leading newline which is ignored in square brackets or escape sequences which are not interpreted.
Quotes and square brackets are only part of the string if you have nested strings. In this case conversion also doesn't make much sense because you cannot nest strings with quotes like strings with brackets.
Maybe your whole approach is a bit off?
Do you look for something like this?
local db = "google"
local tbl = "accounts"
local where = "field = 'VALUE' AND TRUE"
local order = "id DESC"
local query = string.format([[
SELECT *
FROM `%s`.`%s`
WHERE %s
ORDER BY %s
]], db, tbl, where, order)
Related
This is an exercise question from the book the lua programming language the 3th edition.
Exercise2.4: How can you embed the following piece of XML as a string in Lua?
Show at least two different ways.
Here is my answer:
s = "<![CDATA\n Hello world\n]]>"
print(s)
s2 = [[
<![CDATA
Hello world
\]\]>
]]
print(s2)
and the output:
<![CDATA
Hello world
]]>
<![CDATA
Hello world
\]\]>
Way 1 is right. The output of way 2 is not as expected. Without the backslash char, lua will show an error:
lua: execrcise-4.1.lua:7: unexpected symbol near ']'
So I have a question, how to escape brackets in a multi-line string in Lua ?
My lua interpreter version is 5.4.2.
Actually the whole point of this exercise is that you find out how to solve this problem.
Ideally by reading the Lua manual.
There you'll learn that opening and closing brackets for long strings have levels.
Literal strings can also be defined using a long format enclosed by
long brackets. We define an opening long bracket of level n as an
opening square bracket followed by n equal signs followed by another
opening square bracket. So, an opening long bracket of level 0 is
written as [[, an opening long bracket of level 1 is written as [=[,
and so on. A closing long bracket is defined similarly; for instance,
a closing long bracket of level 4 is written as ]====]. A long literal
starts with an opening long bracket of any level and ends at the first
closing long bracket of the same level. It can contain any text except
a closing bracket of the same level.
s2 = [[
<![CDATA
Hello world
]]>
]]
violates that bold rule as you close the long string prematurely leaving you with two extra brackets that cause a syntax error.
So what do you need to do if the string may not contain a closing bracket of level 0 ]] ? We increase the level of our long string.
s2 = [=[
<![CDATA
Hello world
]]>
]=]
You cannot escape a square bracket with a backslash in a Lua string btw.
The only reason why you didn't get an error for the invalid escape sequence \] is that long strings ignore escape sequences.
I am tryng to get rid of shortcodes inside a Google Sheet column. I have many items such as [spacer type="1" height="20"][spacer] or [FinalTilesGallery id="37"] I just would like to cancel them. Is there any simple way to do it?
Thanks !
For in-place replacement, the quick option would be to use the Find and Replace dialog (Ctrl + H) with Search Using Regular Expressions turned on, which is more powerful than your standard Find and Replace.
Find: \[.*?\] - Match anything within an open-bracket up to the very next close-bracket. This should work assuming you have no nested brackets, e.g. [[no][no]].
If you do have nested brackets, you'll have to change this to \[[^\[\]]*\]. And continue to Replace All until all the codes are gone.
Replace: Nothing.
Replace All. If you don't want to affect other sheets that may be in your document, make sure you select the right range to work with, too.
This just erases everything within the brackets.
If you want to erase any redundant spaces left by this, simply Find and Replace again (with Regular Expressions) on + (space and plus), which will match 1 or more spaces and replace with (single space).
E.g.:
string [] [] string2 -> string string2 after the shortcode replacement.
After replacing spaces, it will become string string2.
Let's say your original strings are in the range A2:A. Place the following into B2 of an otherwise completely empty Column B (or the second cell of any other empty column):
=ArrayFormula(IF(A2:A="",,TRIM(REGEXREPLACE(A2:A,"\[[^\[\]]+\]",""))))
I can't see your data, so I don't know what kind of information is between these shortcodes. If you find that this leaves you with concatenated pieces of data where there should be spaces between them, replace the above with this version:
=ArrayFormula(IF(A2:A="",,TRIM(REGEXREPLACE(SUBSTITUTE(SUBSTITUTE(A2:A,"["," ["),"]","] "),"\[[^\[\]]+\]",""))))
I can't teach regular expression language here. But I will note that, since square brackets have specific meaning within regex, your literal square brackets must be indicated with the escape character: the backslash.
Here is the regex expression alone:
\[[^\[\]]+\]
The opening \[ and the closing \], then, reference your actual opening and closing bracket sets. If we remove those, we have this left:
[^\[\]]+
Again, you see the escaped opening and closing square brackets, which I'll replace with the word these:
[^these]+
What remains there are opening and closing brackets with regex meaning, i.e., "anything in this group." And the circumflex symbol ^ as the first character within this set of square brackets means "anything except." The + symbol means "in any string length of one or more characters."
So that whole regex expression then reads: "A literal open square bracket, followed by one or more characters that are anything except square brackets, ending with a literal closing square bracket."
And we are REGEXREPLACE-ing any instance of that with "" (i.e., nothing).
I need a regular expression able to match everything but a string starting with a specific pattern (specifically index.php and what follows, like index.php?id=2342343).
Regex: match everything but:
a string starting with a specific pattern (e.g. any - empty, too - string not starting with foo):
Lookahead-based solution for NFAs:
^(?!foo).*$
^(?!foo)
Negated character class based solution for regex engines not supporting lookarounds:
^(([^f].{2}|.[^o].|.{2}[^o]).*|.{0,2})$
^([^f].{2}|.[^o].|.{2}[^o])|^.{0,2}$
a string ending with a specific pattern (say, no world. at the end):
Lookbehind-based solution:
(?<!world\.)$
^.*(?<!world\.)$
Lookahead solution:
^(?!.*world\.$).*
^(?!.*world\.$)
POSIX workaround:
^(.*([^w].{5}|.[^o].{4}|.{2}[^r].{3}|.{3}[^l].{2}|.{4}[^d].|.{5}[^.])|.{0,5})$
([^w].{5}|.[^o].{4}|.{2}[^r].{3}|.{3}[^l].{2}|.{4}[^d].|.{5}[^.]$|^.{0,5})$
a string containing specific text (say, not match a string having foo):
Lookaround-based solution:
^(?!.*foo)
^(?!.*foo).*$
POSIX workaround:
Use the online regex generator at www.formauri.es/personal/pgimeno/misc/non-match-regex
a string containing specific character (say, avoid matching a string having a | symbol):
^[^|]*$
a string equal to some string (say, not equal to foo):
Lookaround-based:
^(?!foo$)
^(?!foo$).*$
POSIX:
^(.{0,2}|.{4,}|[^f]..|.[^o].|..[^o])$
a sequence of characters:
PCRE (match any text but cat): /cat(*SKIP)(*FAIL)|[^c]*(?:c(?!at)[^c]*)*/i or /cat(*SKIP)(*FAIL)|(?:(?!cat).)+/is
Other engines allowing lookarounds: (cat)|[^c]*(?:c(?!at)[^c]*)* (or (?s)(cat)|(?:(?!cat).)*, or (cat)|[^c]+(?:c(?!at)[^c]*)*|(?:c(?!at)[^c]*)+[^c]*) and then check with language means: if Group 1 matched, it is not what we need, else, grab the match value if not empty
a certain single character or a set of characters:
Use a negated character class: [^a-z]+ (any char other than a lowercase ASCII letter)
Matching any char(s) but |: [^|]+
Demo note: the newline \n is used inside negated character classes in demos to avoid match overflow to the neighboring line(s). They are not necessary when testing individual strings.
Anchor note: In many languages, use \A to define the unambiguous start of string, and \z (in Python, it is \Z, in JavaScript, $ is OK) to define the very end of the string.
Dot note: In many flavors (but not POSIX, TRE, TCL), . matches any char but a newline char. Make sure you use a corresponding DOTALL modifier (/s in PCRE/Boost/.NET/Python/Java and /m in Ruby) for the . to match any char including a newline.
Backslash note: In languages where you have to declare patterns with C strings allowing escape sequences (like \n for a newline), you need to double the backslashes escaping special characters so that the engine could treat them as literal characters (e.g. in Java, world\. will be declared as "world\\.", or use a character class: "world[.]"). Use raw string literals (Python r'\bworld\b'), C# verbatim string literals #"world\.", or slashy strings/regex literal notations like /world\./.
You could use a negative lookahead from the start, e.g., ^(?!foo).*$ shouldn't match anything starting with foo.
You can put a ^ in the beginning of a character set to match anything but those characters.
[^=]*
will match everything but =
Just match /^index\.php/, and then reject whatever matches it.
In Python:
>>> import re
>>> p='^(?!index\.php\?[0-9]+).*$'
>>> s1='index.php?12345'
>>> re.match(p,s1)
>>> s2='index.html?12345'
>>> re.match(p,s2)
<_sre.SRE_Match object at 0xb7d65fa8>
Came across this thread after a long search. I had this problem for multiple searches and replace of some occurrences. But the pattern I used was matching till the end. Example below
import re
text = "start![image]xxx(xx.png) yyy xx![image]xxx(xxx.png) end"
replaced_text = re.sub(r'!\[image\](.*)\(.*\.png\)', '*', text)
print(replaced_text)
gave
start* end
Basically, the regex was matching from the first ![image] to the last .png, swallowing the middle yyy
Used the method posted above https://stackoverflow.com/a/17761124/429476 by Firish to break the match between the occurrence. Here the space is not matched; as the words are separated by space.
replaced_text = re.sub(r'!\[image\]([^ ]*)\([^ ]*\.png\)', '*', text)
and got what I wanted
start* yyy xx* end
On The Lua Interpreter
>print("This is a string
>>spread over multiline")
stdin:1: unfinished string near '"This is a'
Since we know on the Lua interpreter we can finish a statement over mulitline
For eg
>a=2
>a=a+
>>1
This works perfectly
Again:
>print([[This is a multiline
>>string]])
This is a multiline
string
This works fine!! then why display error in the first print() statement??
Read the fine Reference Manual:
3.1 – Lexical Conventions
[…]
A short literal string can be delimited by matching single or double
quotes, and can contain the following C-like escape sequences: '\a' (bell),
'\b' (backspace), '\f' (form feed), '\n' (newline), '\r' (carriage
return), '\t' (horizontal tab), '\v' (vertical tab), '\\' (backslash),
'\"' (quotation mark [double quote]), and '\'' (apostrophe [single
quote]). A backslash followed by a line break results in a newline in the
string. The escape sequence '\z' skips the following span of white-space
characters, including line breaks; it is particularly useful to break and
indent a long literal string into multiple lines without adding the newlines
and spaces into the string contents. A short literal string cannot contain
unescaped line breaks nor escapes not forming a valid escape sequence.
[…]
Literal strings can also be defined using a long format enclosed by long
brackets. We define an opening long bracket of level n as an opening
square bracket followed by n equal signs followed by another opening
square bracket. So, an opening long bracket of level 0 is written as [[,
an opening long bracket of level 1 is written as [=[, and so on. A
closing long bracket is defined similarly; for instance, a closing long
bracket of level 4 is written as ]====]. A long literal starts with an
opening long bracket of any level and ends at the first closing long
bracket of the same level. It can contain any text except a closing
bracket of the same level. Literals in this bracketed form can run for
several lines, do not interpret any escape sequences, and ignore long
brackets of any other level. Any kind of end-of-line sequence (carriage
return, newline, carriage return followed by newline, or newline followed
by carriage return) is converted to a simple newline.
I have following regex handy to match all the lines containing console.log() or alert() function in any javascript file opened in the editor supporting PCRE.
^.*\b(console\.log|alert)\b.*$
But I encounter many files containing window.alert() lines for alerting important messages, I don't want to remove/replace them.
So the question how to regex-match (single line regex without need to run frequently) all the lines containing console.log() and alert() but not containing word window. Also how to escape round brackets(parenthesis) which are unescapable by \, to make them part of string literal ?
I tried following regex but in vain:
^.*\b(console\.log|alert)((?!window).)*\b.*$
You should use a negative lookhead, like this:
^(?!.*window\.).*\b(console\.log|alert)\b.*$
The negative lookhead will assert that it is impossible to match if the string window. is present.
Regex Demo
As for the parenthesis, you can escape them with backslashes, but because you have a word boundary character, it will not match if you put the escaped parenthesis, because they are not word characters.
The metacharacter \b is an anchor like the caret and the dollar sign.
It matches at a position that is called a "word boundary". This match
is zero-length.
There are three different positions that qualify as word boundaries:
Before the first character in the string, if the first character is a
word character.
After the last character in the string, if the last
character is a word character.
Between two characters in the string,
where one is a word character and the other is not a word character.