This is an exercise question from the book the lua programming language the 3th edition.
Exercise2.4: How can you embed the following piece of XML as a string in Lua?
Show at least two different ways.
Here is my answer:
s = "<![CDATA\n Hello world\n]]>"
print(s)
s2 = [[
<![CDATA
Hello world
\]\]>
]]
print(s2)
and the output:
<![CDATA
Hello world
]]>
<![CDATA
Hello world
\]\]>
Way 1 is right. The output of way 2 is not as expected. Without the backslash char, lua will show an error:
lua: execrcise-4.1.lua:7: unexpected symbol near ']'
So I have a question, how to escape brackets in a multi-line string in Lua ?
My lua interpreter version is 5.4.2.
Actually the whole point of this exercise is that you find out how to solve this problem.
Ideally by reading the Lua manual.
There you'll learn that opening and closing brackets for long strings have levels.
Literal strings can also be defined using a long format enclosed by
long brackets. We define an opening long bracket of level n as an
opening square bracket followed by n equal signs followed by another
opening square bracket. So, an opening long bracket of level 0 is
written as [[, an opening long bracket of level 1 is written as [=[,
and so on. A closing long bracket is defined similarly; for instance,
a closing long bracket of level 4 is written as ]====]. A long literal
starts with an opening long bracket of any level and ends at the first
closing long bracket of the same level. It can contain any text except
a closing bracket of the same level.
s2 = [[
<![CDATA
Hello world
]]>
]]
violates that bold rule as you close the long string prematurely leaving you with two extra brackets that cause a syntax error.
So what do you need to do if the string may not contain a closing bracket of level 0 ]] ? We increase the level of our long string.
s2 = [=[
<![CDATA
Hello world
]]>
]=]
You cannot escape a square bracket with a backslash in a Lua string btw.
The only reason why you didn't get an error for the invalid escape sequence \] is that long strings ignore escape sequences.
Related
I am writing a simple scanner in flex. I want my scanner to print out "integer type seen" when it sees the keyword "int". Is there any difference between the following two ways?
1st way:
%%
int printf("integer type seen");
%%
2nd way:
%%
"int" printf("integer type seen");
%%
So, is there a difference between writing if or "if"? Also, for example when we see a == operator, we print something. Is there a difference between writing == or "==" in the flex file?
There's no difference in these specific cases -- the quotes(") just tell lex to NOT interpret any special characters (eg, for regular expressions) in the quoted string, but if there are no special characters involved, they don't matter:
[a-z] printf("matched a single letter\n");
"[a-z]" printf("matched the 5-character string '[a-z]'\n");
0* printf("matched zero or more zero characters\n");
"0*" printf("matched a zero followed by an asterisk\n");
Characters that are special and mean something different outside of quotes include . * + ? | ^ $ < > [ ] ( ) { } /. Some of those only have special meaning if they appear at certain places, but its generally clearer to quote them regardless of where they appear if you want to match the literal characters.
Groovy supports / as a division operator:
groovy> 1 / 2
===> 0.5
It supports / as a string delimiter, which can even be multiline:
groovy> x = /foo/
===> foo
groovy:000> x = /foo
groovy:001> bar/
===> foo
bar
Given this, why can't I evaluate a slashy-string literal in groovysh?
groovy:000> /foo/
groovy:001>
clearly groovysh thinks this is unterminated for some reason.
How does groovy avoid getting confused between division and strings? What does this code mean:
groovy> f / 2
Is this a function call f(/2 .../) where / is beginning a multiline slashy-string, or f divided by 2?
How does Groovy distinguish division from strings?
I'm not entirely sure how Groovy does it, but I'll describe how I'd do it, and I'd be very surprised if Groovy didn't work in a similar way.
Most parsing algorithms I've heard of (Shunting-yard, Pratt, etc) recognize two distinct kinds of tokens:
Those that expect to be preceded by an expression (infix operators, postfix operators, closing parentheses, etc). If one of these is not preceded by an expression, it's a syntax error.
Those that do not expect to be preceded by an expression (prefix operators, opening parentheses, identifiers, literals, etc). If one of these is preceded by an expression, it's a syntax error.
To make things easier, from this point onward I'm going to refer to the former kind of token as an operator and the latter as a non-operator.
Now, the interesting thing about this distinction is that it's made not based on what the token actually is, but rather on the immediate context, particularly the preceding tokens. Because of this, the same token can be interpreted very differently depending on its position in the code, and whether the parser classifies it as an operator or a non-operator. For example, the '-' token, if in an operator position, denotes a subtraction, but the same token in a non-operator position is a negation. There is no issue deciding whether a '-' is a subtraction operator or not, because you can tell based on its context.
The same is, in general, true for the '/' character in Groovy. If preceded by an expression, it's interpreted as an operator, which means it's a division. Otherwise, it's a non-operator, which makes it a string literal. So, you can generally tell if a '/' is a division or not, by looking at the token that immediately precedes it:
The '/' is a division if it follows an identifier, literal, postfix operator, closing parenthesis, or other token that denotes the end of an expression.
The '/' begins a string if it follows a prefix operator, infix operator, opening parenthesis, or other such token, or if it begins a line.
Of course, it isn't quite so simple in practice. Groovy is designed to be flexible in the face of various styles and uses, and therefore things like semicolons or parentheses are often optional. This can make parsing somewhat ambiguous at times. For example, say our parser comes across the following line:
println / foo
This is most likely an attempt to print a multiline string: foo is the beginning of a string being passed to println as an argument, and the optional parentheses around the argument list are left out. Of course, to a simple parser it looks like a division. I expect the Groovy parser can tell the difference by reading ahead to the following lines to see which interpretation does not give an error, but for something like groovysh that is literally impossible (since, as a repl, it doesn't yet have access to more lines), so it's forced to just guess.
Why can't I evaluate a slashy-string literal in groovysh?
As before, I don't know the exact reason, but I do know that because groovysh is a repl, it's bound to have more trouble with the more ambiguous rules. Even so, a simple single-line slashy-string is pretty unambiguous, so I believe something else may be going on here. Here is the result of me playing with various forms in groovysh:
> /foo - unexpected char: '/' # line 2, column 1.
> /foo/ - awaits further input
> /foo/bar - unexpected char: '/' # line 2, column 1.
> /foo/bar/ - awaits further input
> /foo/ + 'bar' - unexpected char: '/' # line 2, column 1.
> 'foo' + /bar/ - evaluates to 'foobar'
> /foo/ - evaluates to 'foo'
> /foo - awaits further input
> /foo/bar - Unknown property: bar
It appears that something strange happens when a '/' character is the first character in a line. The pattern it appears to follow (as far as I can tell) is this:
A slash as the first character of a line begins a strange parsing mode.
In this mode, every line that ends with a slash followed by nothing but whitespace causes the repl to await further lines.
On the first line that ends with something other than a slash (or whitespace following a slash), the error unexpected char: '/' # line 2, column 1. is printed.
I've also noticed a couple of interesting points regarding this:
Both forward slashes (/) and backslashes (\) appear to count, and seem to be completely interchangeable, in this special mode.
This does not appear to happen at all in groovyConsole or in actual Groovy files.
Putting any whitespace before the opening slash character causes groovysh to interpret it correctly, but only if the opening slash is a forward slash, not a backslash.
So, I personally expect that this is just a quirk of groovysh, either a bug or some under-documented feature I haven't heard about.
Is there a way to convert a quoted string to a multiline string?
Something like "This string \66 here" to [[This string \66 here]] since I would like to ignore the interpretation of escaped characters.
Lua 5.3 Reference Manual 3.1: Lexical Conventions
Literal strings can also be defined using a long format enclosed by
long brackets. We define an opening long bracket of level n as an
opening square bracket followed by n equal signs followed by another
opening square bracket. So, an opening long bracket of level 0 is
written as [[, an opening long bracket of level 1 is written as [=[,
and so on. A closing long bracket is defined similarly; for instance,
a closing long bracket of level 4 is written as ]====]. A long literal
starts with an opening long bracket of any level and ends at the first
closing long bracket of the same level. It can contain any text except
a closing bracket of the same level. Literals in this bracketed form
can run for several lines, do not interpret any escape sequences, and
ignore long brackets of any other level. Any kind of end-of-line
sequence (carriage return, newline, carriage return followed by
newline, or newline followed by carriage return) is converted to a
simple newline.
For convenience, when the opening long bracket is immediately followed
by a newline, the newline is not included in the string.
That's all you need to know about long strings.
It does not make much sense to convert a string that has been defined using quotes "some string" to a string like [[some string]] as both quotes and square brackets are not actually part of that string and the string itself is the same.
The only difference would be a leading newline which is ignored in square brackets or escape sequences which are not interpreted.
Quotes and square brackets are only part of the string if you have nested strings. In this case conversion also doesn't make much sense because you cannot nest strings with quotes like strings with brackets.
Maybe your whole approach is a bit off?
Do you look for something like this?
local db = "google"
local tbl = "accounts"
local where = "field = 'VALUE' AND TRUE"
local order = "id DESC"
local query = string.format([[
SELECT *
FROM `%s`.`%s`
WHERE %s
ORDER BY %s
]], db, tbl, where, order)
On The Lua Interpreter
>print("This is a string
>>spread over multiline")
stdin:1: unfinished string near '"This is a'
Since we know on the Lua interpreter we can finish a statement over mulitline
For eg
>a=2
>a=a+
>>1
This works perfectly
Again:
>print([[This is a multiline
>>string]])
This is a multiline
string
This works fine!! then why display error in the first print() statement??
Read the fine Reference Manual:
3.1 – Lexical Conventions
[…]
A short literal string can be delimited by matching single or double
quotes, and can contain the following C-like escape sequences: '\a' (bell),
'\b' (backspace), '\f' (form feed), '\n' (newline), '\r' (carriage
return), '\t' (horizontal tab), '\v' (vertical tab), '\\' (backslash),
'\"' (quotation mark [double quote]), and '\'' (apostrophe [single
quote]). A backslash followed by a line break results in a newline in the
string. The escape sequence '\z' skips the following span of white-space
characters, including line breaks; it is particularly useful to break and
indent a long literal string into multiple lines without adding the newlines
and spaces into the string contents. A short literal string cannot contain
unescaped line breaks nor escapes not forming a valid escape sequence.
[…]
Literal strings can also be defined using a long format enclosed by long
brackets. We define an opening long bracket of level n as an opening
square bracket followed by n equal signs followed by another opening
square bracket. So, an opening long bracket of level 0 is written as [[,
an opening long bracket of level 1 is written as [=[, and so on. A
closing long bracket is defined similarly; for instance, a closing long
bracket of level 4 is written as ]====]. A long literal starts with an
opening long bracket of any level and ends at the first closing long
bracket of the same level. It can contain any text except a closing
bracket of the same level. Literals in this bracketed form can run for
several lines, do not interpret any escape sequences, and ignore long
brackets of any other level. Any kind of end-of-line sequence (carriage
return, newline, carriage return followed by newline, or newline followed
by carriage return) is converted to a simple newline.
function writeFloat([=[==[===[====["game.exe"+XXXXXXXX]+XXX====]+XXX===]+XXX==]+XXX=]+XXX, trackbar_getPosition(TRAINERFORM_CETrackBar1))
end
gives me the error
[string "--code..."]:4: unfinished long string near
Lua has "long strings", which are induced by the syntax of [=*[, where "=*" means "zero or more = characters". So [[ begins a long string, as does [==[ or [=[, as in your case.
A long string is so named because it accepts every character between the inducing syntax and the terminating syntax. This allows you to do useful things like add verbatim XML, C++, or even Lua code within your Lua script as a literal string.
The terminating syntax is ]=*], where "=*" means the exact same number of = characters that was used to induce the long string. So if you start with [=[, the long string will only end with ]=]. ]] and ]====] or any other terminus will not end the long string; they'll be taken verbatim into the string.
So this:
local lit = [=[Long String]==]=]
Results in lit taking the value Long String]==.
In your code, you never see a ]=] sequence. You have ====] and similar things, but they don't even start with a ] character.
It is illegal to start a long string that never ends in a Lua script. Hence the compile error.