what does "(.+)"}], in lua pattern matching mean - lua

i got this code that take webhook message but i dont understand the pattern matching behind it
function getContent(link)
cmd = io.popen('powershell -command "[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12; Write-Host (Invoke-WebRequest -Uri "'..link..'").Content"')
return cmd:read("*all"):match('"description": "(.+)"}],')
end
function blabla(a)
link= webhookLink.."/messages/"..messageId
text = [[
$webHookUrl = "]]..link..[["
$embedObject = #{
description = "]]..getContent(link):gsub("\\([nt])", {n="\n", t="\t"})..[[`n]]..a..[["
}
$embedArray = #($embedObject)
$payload = #{
embeds = $embedArray
}
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
Invoke-RestMethod -Uri $webHookUrl -Body ($payload | ConvertTo-Json -Depth 4) -Method Patch -ContentType 'application/json'
]]
file = io.popen("powershell -command -", "w")
file:write(text)
file:close()
end
My question is what does "(.+)"}] mean in the getContent function and ]]..getContent(link):gsub("\\([nt])", {n="\n", t="\t"})..[[n]]..a..[[inblabla` function ?

https://gitspartv.github.io/lua-patterns/
This can help you greatly with pattern matching,
though, (.+) means capture any digit, with multiple repetations.
example: "ABC(.+)" would return everything after "ABC"
return cmd:read("*all"):match('"description": "(.+)"}],')
Looks like this gets everything inside description: "THIS_IS_RETURNED"],

Let's break up the pattern: "description": "(.+)"}],
"description": " - literal string that needs to match
(.+) - one or more of any character, greedy (+), captured
"}], - another literal string
That is, this pattern will match the first substring that starts with "description": ", followed by one or more characters which are captured, followed by "}],.
All in all, this pattern is a very unreliable implementation of extracting a certain string from what's probably JSON; you should use a proper JSON parser instead. This pattern will fail in all of the following cases:
Empty String: [[{"description": ""}],null] (might be intended)
Another String later on: [{"foo": [{"description": "bar"}], "baz": "world"}], null], which would match as bar"}], "baz": "world due to + doing greedy matching.

. means any character, and + means one or more. I suggest you to read up on the basics of patterns. For example: http://lua-users.org/wiki/PatternsTutorial

Related

Implement heredocs with trim indent using PEG.js

I working on a language similar to ruby called gaiman and I'm using PEG.js to generate the parser.
Do you know if there is a way to implement heredocs with proper indentation?
xxx = <<<END
hello
world
END
the output should be:
"hello
world"
I need this because this code doesn't look very nice:
def foo(arg) {
if arg == "here" then
return <<<END
xxx
xxx
END
end
end
this is a function where the user wants to return:
"xxx
xxx"
I would prefer the code to look like this:
def foo(arg) {
if arg == "here" then
return <<<END
xxx
xxx
END
end
end
If I trim all the lines user will not be able to use a string with leading spaces when he wants. Does anyone know if PEG.js allows this?
I don't have any code yet for heredocs, just want to be sure if something that I want is possible.
EDIT:
So I've tried to implement heredocs and the problem is that PEG doesn't allow back-references.
heredoc = "<<<" marker:[\w]+ "\n" text:[\s\S]+ marker {
return text.join('');
}
It says that the marker is not defined. As for trimming I think I can use location() function
I don't think that's a reasonable expectation for a parser generator; few if any would be equal to the challenge.
For a start, recognising the here-string syntax is inherently context-sensitive, since the end-delimiter must be a precise copy of the delimiter provided after the <<< token. So you would need a custom lexical analyser, and that means that you need a parser generator which allows you to use a custom lexical analyser. (So a parser generator which assumes you want a scannerless parser might not be the optimal choice.)
Recognising the end of the here-string token shouldn't be too difficult, although you can't do it with a single regular expression. My approach would be to use a custom scanning function which breaks the here-string into a series of lines, concatenating them as it goes until it reaches a line containing only the end-delimiter.
Once you've recognised the text of the literal, all you need to normalise the spaces in the way you want is the column number at which the <<< starts. With that, you can trim each line in the string literal. So you only need a lexical scanner which accurately reports token position. Trimming wouldn't normally be done inside the generated lexical scanner; rather, it would be the associated semantic action. (Equally, it could be a semantic action in the grammar. But it's always going to be code that you write.)
When you trim the literal, you'll need to deal with the cases in which it is impossible, because the user has not respected the indentation requirement. And you'll need to do something with tab characters; getting those right probably means that you'll want a lexical scanner which computes visible column positions rather than character offsets.
I don't know if peg.js corresponds with those requirements, since I don't use it. (I did look at the documentation, and failed to see any indication as to how you might incorporate a custom scanner function. But that doesn't mean there isn't a way to do it.) I hope that the discussion above at least lets you check the detailed documentation for the parser generator you want to use, and otherwise find a different parser generator which will work for you in this use case.
Here is the implementation of heredocs in Peggy successor to PEG.js that is not maintained anymore. This code was based on the GitHub issue.
heredoc = "<<<" begin:marker "\n" text:($any_char+ "\n")+ _ end:marker (
&{ return begin === end; }
/ '' { error(`Expected matched marker "${begin}", but marker "${end}" was found`); }
) {
const loc = location();
const min = loc.start.column - 1;
const re = new RegExp(`\\s{${min}}`);
return text.map(line => {
return line[0].replace(re, '');
}).join('\n');
}
any_char = (!"\n" .)
marker_char = (!" " !"\n" .)
marker "Marker" = $marker_char+
_ "whitespace"
= [ \t\n\r]* { return []; }
EDIT: above didn't work with another piece of code after heredoc, here is better grammar:
{ let heredoc_begin = null; }
heredoc = "<<<" beginMarker "\n" text:content endMarker {
const loc = location();
const min = loc.start.column - 1;
const re = new RegExp(`^\\s{${min}}`, 'mg');
return {
type: 'Literal',
value: text.replace(re, '')
};
}
__ = (!"\n" !" " .)
marker 'Marker' = $__+
beginMarker = m:marker { heredoc_begin = m; }
endMarker = "\n" " "* end:marker &{ return heredoc_begin === end; }
content = $(!endMarker .)*

Split string into list of words and separators

Given this string:
one#two*three#four#five*
What is a fast solution to extract the list of Pairs?
Each pair contains the word with its separator character like this:
[
['one', '#'],
['two', '*'],
['three', '#'],
['four', '#'],
['five', '*']
]
Specifically in my case I want to use both white space and new line characters as separators.
You'd need a regular expression:
(\w+)([#|*])
See example Dart code here that should get you going: https://dartpad.dartlang.org/ae3897b2221a94b5a4c9e6929bebcfce
Full disclosure: dart is a relatively new language to me.
That said, regex might be your best bet. Assuming you are only working with lowercase a-z letters followed by a single character, this should do the trick.
RegExp r = RegExp("([a-z]+)(.)");
var matches = r.allMatches("one#two*three#four#five*");
List<dynamic> l = [];
matches.toList().asMap().forEach((i, m) => l.add([m.group(1), m.group(2)]));
print(l);
Based on other responses here's my solution for white spaces and new lines as separators:
void main() {
RegExp r = RegExp(r"(\S+)([\s]+|$)");
var text = 'one two three \n\n four ';
var matches = r.allMatches(text);
List<dynamic> l = [];
matches.toList().asMap().forEach((i, m) => l.add([m.group(1), m.group(2)]));
print(l);
}
Output
[[one, ], [two, ], [three,
], [four, ]]
Explanation: https://regex101.com/r/cRpMVq/2

Openresty : getting issue with ngx.re.gsub to handle magic characters of lua

I want to replace a word into my body content from other string .
To implement this i am using ngx.re.sgub but i am getting a weird issue. ngx.re.gsub is not handling magic characters.
Example :
content1 = "HiTestHello Test how are you Testall "
_ssi = "Test"
body = "$100.00"
content2 = ngx.re.gsub(content1, _ssi, body)
ngx.print(content2)
output is
Hi.00lHelo .00 how are you .00all he.00llo .00 how are you .00all
while output should like :
Hi$100.00Hello .00 how are you .00all.
Please let me know how can i achieve this .
In ngx regex, $1, $2, etc. are variable to be captured. Try escape the $ character:
body = "$$100.00"
Wrap the body with a function also avoids it:
content1 = "HiTestHello Test how are you Testall "
_ssi = "Test"
body = "$100.00"
content2 = ngx.re.gsub(content1, _ssi, function()
return body
end, "o")
ngx.print(content2)

Lua String Split

Hi I've got this function in JavaScript:
function blur(data) {
var trimdata = trim(data);
var dataSplit = trimdata.split(" ");
var lastWord = dataSplit.pop();
var toBlur = dataSplit.join(" ");
}
What this does is it take's a string such as "Hello my name is bob" and will return
toBlur = "Hello my name is" and lastWord = "bob"
Is there a way i can re-write this in Lua?
You could use Lua's pattern matching facilities:
function blur(data) do
return string.match(data, "^(.*)[ ][^ ]*$")
end
How does the pattern work?
^ # start matching at the beginning of the string
( # open a capturing group ... what is matched inside will be returned
.* # as many arbitrary characters as possible
) # end of capturing group
[ ] # a single literal space (you could omit the square brackets, but I think
# they increase readability
[^ ] # match anything BUT literal spaces... as many as possible
$ # marks the end of the input string
So [ ][^ ]*$ has to match the last word and the preceding space. Therefore, (.*) will return everything in front of it.
For a more direct translation of your JavaScript, first note that there is no split function in Lua. There is table.concat though, which works like join. Since you have to do the splitting manually, you'll probably use a pattern again:
function blur(data) do
local words = {}
for m in string.gmatch("[^ ]+") do
words[#words+1] = m
end
words[#words] = nil -- pops the last word
return table.concat(words, " ")
end
gmatch does not give you a table right away, but an iterator over all matches instead. So you add them to your own temporary table, and call concat on that. words[#words+1] = ... is a Lua idiom to append an element to the end of an array.

Parsing blocks of line comments using MGrammar

How can I parse blocks of line comments with MGrammar?
I want to parse blocks of line comments. Line comments that are next to each should grouped in the MGraph output.
I'm having trouble grouping blocks of line comments together. My current grammar uses "\r\n\r\n" to terminate a block but that will not work in all cases such as at end of file or when I introduce other syntaxes.
Sample input could look like this:
/// This is block
/// number one
/// This is block
/// number two
My current grammar looks like this:
module MyModule
{
language MyLanguage
{
syntax Main = CommentLineBlock*;
token CommentContent = !(
'\u000A' // New Line
|'\u000D' // Carriage Return
|'\u0085' // Next Line
|'\u2028' // Line Separator
|'\u2029' // Paragraph Separator
);
token CommentLine = "///" c:CommentContent* => c;
syntax CommentLineBlock = (CommentLine)+ "\r\n\r\n";
interleave Whitespace = " " | "\r" | "\n";
}
}
The Problem is, that you interleave all whitespaces - so after parsing the tokens and coming to the lexer, they just "don't exist" anymore.
CommentLineBlock is syntax in your case, but you need the comment-blocks to be completely consumed in tokens...
language MyLanguage
{
syntax Main = CommentLineBlock*;
token LineBreak = '\u000D\u000A'
| '\u000A' // New Line
|'\u000D' // Carriage Return
|'\u0085' // Next Line
|'\u2028' // Line Separator
|'\u2029' // Paragraph Separator
;
token CommentContent = !(
'\u000A' // New Line
|'\u000D' // Carriage Return
|'\u0085' // Next Line
|'\u2028' // Line Separator
|'\u2029' // Paragraph Separator
);
token CommentLine = "//" c:CommentContent*;
token CommentLineBlock = c:(CommentLine LineBreak?)+ => Block {c};
interleave Whitespace = " " | "\r" | "\n";
}
But then the problem is, that the subtoken-rules in CommentLine won't be processed - you get plain strings parsed.
Main[
[
Block{
"/// This is block\r\n/// number one\r\n"
},
Block{
"/// This is block\r\n/// number two"
}
]
]
I might try to find a nicer way tonight :-)

Resources