my input is a recursive structure looks like this (notice the blank 2nd line):
xxx #{} yyy #{ zzz #{} wwww }
the grammar as i see that would read it should look like this:
start = item+
item = thing / space
thing = '#{' item* '}'
space = (!'#' .)+
but what i get is
Line 2, column 1: Expected "#{", "}", or any character but end of input found.
what am i doing wrong?
I do not know peg at all, but a quick look at the docs seems to say the dot in the 4th rule is the problem. The online parser succeeds with:
start = item+
item = thing / space
thing = '#{' item* '}'
space = [ a-z]+
This produces:
[
[
"x",
"x",
"x",
" "
],
[
"#{",
[],
"}"
],
[
" ",
"y",
"y",
"y",
" "
],
[
"#{",
[
[
" ",
"z",
"z",
"z",
" "
],
[
"#{",
[],
"}"
],
[
" ",
"w",
"w",
"w",
"w",
" "
]
],
"}"
]
]
In order to make it run, I modified the code as:
start = item+
item = thing / space
thing = '#{' item* '}'
space =[^#}]+
Related
I was trying to make a simple text cryptor, but the script works only if put spaces after every symbol
code:
local text = ""
local tdext = text:gsub("%S+", {["+"] = "a", ["×"] = "b", ["÷"] = "c", ["="] = "d", ["/"] = "e", ["_"] = "f", ["€"] = "g", ["¥"] = "h", ["₩"] = "i", ["!"] = "j", ["#"] = "k", ["#"] = "l", ["$"] = "m", ["%"] = "n", ["^"] = "o", ["&"] = "p", ["*"] = "q", ["("] = "r", [")"] = "s", ["-"] = "t", ["'"] = "u", [":"] = "v", [";"] = "w", [","] = "x", ["?"] = "y", ["."] = "z", [" "] = " "})
print(tdext)
I tried fixing it, but it doesnt do what it should.
If i put in text variable "÷ =" it outputs "b c", but if i am putting "÷=" in variable it will output "÷=".
Let's take a closer look at your substitutions:
local subs = {
["+"] = "a", ["×"] = "b", ["÷"] = "c", ["="] = "d", ["/"] = "e",
["_"] = "f", ["€"] = "g", ["¥"] = "h", ["₩"] = "i", ["!"] = "j",
["#"] = "k", ["#"] = "l", ["$"] = "m", ["%"] = "n", ["^"] = "o",
["&"] = "p", ["*"] = "q", ["("] = "r", [")"] = "s", ["-"] = "t",
["'"] = "u", [":"] = "v", [";"] = "w", [","] = "x", ["?"] = "y",
["."] = "z", [" "] = " "
}
local tdext = text:gsub("%S+", subs)
%S+ matches a sequence of one or more non-space bytes. If you have single characters - multi-byte (UTF-8) or single-byte (ASCII) - this will work fine. However if you have a sequence of multiple characters (say, +-), this won't perform the replacement, since both + and - won't be found in your lookup table. The same is the case for the multi-byte ÷=: ÷ = works, because your characters are separated by spaces; ÷= doesn't, because the pattern greedily matches the sequence.
If this is supposed to be a character-wise substitution, you'll need to match characters (UTF-8 sequences, which includes ASCII). Lua 5.3 and later will have the "constant" utf8.charpattern which is a pattern string matching a single UTF-8 character. If you have a recent Lua version, the fix becomes trivial: Just replace "%S+" with utf8.charpattern:
local tdext = text:gsub(utf8.charpattern, subs)
In older Lua versions (up to and including 5.2), you'll have to write this pattern yourself, using decimal escapes:
local charpattern = "[%z-\127\194-\244][\128-\191]*"
local tdext = text:gsub(charpattern, subs)
Alternatively, if you also want to support multi-character substitutions, you can simply apply the substitutions one by one (which is however significantly less efficient by a factor linear in the number of entries in the subs table):
-- We need to escape everything to make Lua treat it as a literal string
local function escape_pattern(str)
return str:gsub(".", "%%.")
end
local tdext = text
for from, to in pairs(subs) do
tdext = tdext:gsub(escape_pattern(from), escape_pattern(to))
end
Say, I have the following EBNF:
document = content , { content } ;
content = hello world | answer | space ;
hello world = "hello" , space , "world" ;
answer = "42" ;
space = " " ;
This lets me parse something like:
hello world 42
Now I want to extend this grammar with a block comment. How can I do this properly?
If I start simple:
document = content , { content } ;
content = hello world | answer | space | comment;
hello world = "hello" , space , "world" ;
answer = "42" ;
space = " " ;
comment = "/*" , ?any character? , "*/" ;
I cannot parse:
Hello /* I'm the taxman! */ World 42
If I extend the grammar further with the special case from above, it gets ugly, but parses.
document = content , { content } ;
content = hello world | answer | space | comment;
hello world = "hello" , { comment } , space , { comment } , "world" ;
answer = "42" ;
space = " " ;
comment = "/*" , ?any character? , "*/" ;
But I still cannot parse something like:
Hel/*p! I need somebody. Help! Not just anybody... */lo World 42
How would I do this with an EBNF grammar? Or is it not even possible at all?
Assuming you would consider "hello" as a token, you would not want anything to break that up. Should you need to do so, it becomes necessary to explode the rule:
hello_world = "h", {comment}, "e", {comment}, "l", {comment}, "l", {comment}, "o" ,
{ comment }, space, { comment },
"w", {comment}, "o", {comment}, "r", {comment}, "l", {comment}, "d" ;
Considering the broader question, it seems commonplace to not describe language comments as part of the formal grammar, but to instead make it a side note. However, it can generally be done by treating the comment as equivalent to whitespace:
space = " " | comment ;
You may also want to consider adding a rule to describe consecutive whitespace:
spaces = { space }- ;
Cleaning up your final grammar, but treating "hello" and "world" as tokens (i.e. not allowing them to be broken apart), could result in something like this:
document = { content }- ;
content = hello world | answer | space ;
hello world = "hello" , spaces , "world" ;
answer = "42" ;
spaces = { space }- ;
space = " " | comment ;
comment = "/*" , ?any character? , "*/" ;
How would I do this with an EBNF grammar? Or is it not even possible at all?
Some languages remove comments, some replace comments with a space, in a preprocessor. Removing the comments seems the easiest solution to this problem. However, this solution would remove comments from literals, which would not be done, normally.
document = preprocess, process;
preprocess = {(? any character ? - comment, ? append char to text ?)},
? text for input to process ?;
comment = "/*", {? any character ? - "*/"}, "*/", ? discard ?;
process = {content}-;
content = hello world | answer | spaces;
hello world = ("H" | "h"), "ello", spaces, ("W" | "w") , "orld";
answer = "42";
spaces = {" "}-;
The preprocessor, given,
Hello /* I'm the taxman! */ World 42
produces
Hello World 42
Notice the two spaces.
And, for
Hel/*p! I need somebody. Help! Not just anybody... */lo World 42
produces
Hello World 42
Question
How to wrap header above (inserted by add_header_above())?
There is a simple way to do it to one layered header but doesn't work when there is a second (or third) of header.
Reproducible example
library(kableExtra)
names(iris) <- c("L", "W", "L", "W", " ")
iris[1:2, ] %>%
kable("latex") %>%
add_header_above(
c(
"Sepal is great" = 2,
"Petal is better, (in fac my favorite)" = 2,
"nc" = 1)
) %>%
column_spec(2:ncol(iris), width = "0.3in")
Current output looks
Expected output from R code (roughly)
As I said in Best Practice for newline in LaTeX table, if you need newlines inside all kableExtra functions, just use \n. Otherwise, you can try out the linebreak function.
library(kableExtra)
names(iris) <- c("L", "W", "L", "W", " ")
iris[1:2, ] %>%
kable("latex") %>%
add_header_above(
c(
"Sepal\nis great" = 2,
"Petal is better,\n(in fac my favorite)" = 2,
"nc" = 1)
) %>%
column_spec(2:ncol(iris), width = "0.3in")
is there any way to combine these last two formatting lines of code into one?
str = "1, 2, 3, 4, 5, "
str = str:gsub("%p", {[","] = " >" }) -- replaces ',' with '>'
str = string.sub(str, 1, #str - 2) --removes last whitespace + comma
Thanks in advance :)
str = "1, 2, 3, 4, 5, "
str = str:sub(1, #str-2):gsub("%p", {[","] = " >" })
This will do what you want it to do.
Egor's is a bit more elegant, though:
str = str:gsub(',',' > '):sub(1,-3)
I'm trying to figure out how to dump a text block from an HDF5 file (a Bathymetric Attributed Grid / BAG). When I do h5dump -d /BAG_root/metadata H11703_Office_5m.bag, and anything else I've tried, I always get the data with each character of the XML quoted. Is there an "easy" option to have it dump the raw data contents to a file or the terminal?
DATASET "/BAG_root/metadata" {
DATATYPE H5T_STRING {
STRSIZE 1;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}
DATASPACE SIMPLE { ( 5097 ) / ( H5S_UNLIMITED ) }
DATA {
(0): "<", "?", "x", "m", "l", " ", "v", "e", "r", "s", "i", "o", "n", "=",
(14): """, "1", ".", "0", """, "?", ">", "
", "<", "s", "m",
(25): "X", "M", "L", ":", "M", "D", "_", "M", "e", "t", "a", "d", "a",
Marcus Cole emailed me this solution after I brought up the topic on the OpenNavSurf mailing list:
h5dump -b FILE -o H12279_VB_4m_MLLW_1of1.xml -d BAG_root/metadata H12279_VB_4m_MLLW_1of1.bag
This writes out a clean XML file.
Re: Python & BAG, GDAL 1.7.0+ supports the BAG format; e.g.:
from osgeo import gdal
bag = gdal.OpenShared(r"C:\DATA\NGDC\H11555_2m_1.bag")
bagmetadata = bag.GetMetadata("xml:BAG")[0]
The data is stored as an array of 5097 single characters strings (STRSIZE 1). To dump the text, it should have been stored as a real string (e.g. in a scalar dataspace).
So I think you cannot do it with h5dump alone, you probably have to process the dump with sed or your favorite regexp tool.