Parsing string input for keywords followed by content - parsing

I'm trying to parse some string input but I'm struggling to see the solution. However, this must be a well-known pattern-- it's just one I don't encounter frequently.
Background: I have a short list of string keywords ("HEAD", "GET", "POST", "PUT") each of which are followed by additional string data. There can be multiple of the sequence, in any order ("KEYWORD blah blah blah KEYWORD blah blah blah"). There are no termination characters or ending keywords as XML would have -- there's either a new occurance of a keyword clause or the end of the input. Sample:
str: {HEAD stuff here GET more stuff here POST other stuff here GET even more stuff here PUT still more stuff here POST random stuff}
The output I'd like to achieve:
results: [
"HEAD" ["stuff here"]
"GET" ["more stuff here" "even more stuff here"]
"POST" ["other stuff here" "random stuff"]
"PUT" ["still more stuff here"]
]
My poor attempt at this is:
results: ["head" [] "get" [] "post" [] "put" []]
rule1: ["HEAD" (r: "head") | "GET" (r: "get") | "POST" (r: "post") | "PUT" (r: "put")]
rule2: [to "HEAD" | to "GET" | to "POST" | to "PUT" | to end]
parse/all str [
some [
start: rule1 rule2 ending:
(offs: offset? start ending
append select results r trim copy/part start offs
) :ending
| skip]
]
I know that rule-2 is the clunker-- the use of the "to" operators is not the right way to think about this pattern; it skips to the next occurrance of the first available keyword in that rule block when I want it to find any of the keywords.
Any tips would be appreciated.

How about this...
;; parse rules
keyword: [{HEAD} | {GET} | {POST} | {PUT}]
content: [not keyword skip]
;; prep results block... ["HEAD" [] "GET" [] "POST" [] "PUT" []]
results: []
forskip keyword 2 [append results reduce [keyword/1 make block! 0]]
parse/case str [
any [
copy k keyword copy c some content (
append results/:k trim c
)
]
]
Using your str then results will have what you wanted....
["HEAD" ["stuff here"] "GET" ["more stuff here" "even more stuff here"] "POST" ["other stuff here" "random stuff"] "PUT" ["still more stuff here"]]

maybe not so elegant, but even working with Rebol2
results: ["HEAD" [] "GET" [] "POST" [] "PUT" []]
keyword: [{HEAD} | {GET} | {POST} | {PUT}]
parse/case str [
any [
[copy k keyword c1: ] | [skip c2:]
[[keyword | end] (
append results/:k trim copy/part c1 c2
) :c2 |
]
]
]

Here is another variant.
str: {HEAD stuff here GET more stuff here POST other stuff here GET even more stuff here PUT still more stuff here POST random stuff}
results: ["HEAD" [] "GET" [] "POST" [] "PUT" []]
possible-verbs: [ "HEAD" | "GET" | "POST" | "PUT" | end ]
parse/all str [
some [
to possible-verbs
verb-start: (verb: first split verb-start " ")
possible-verbs
copy text to possible-verbs
(if not none? verb [ append results/:verb trim text ])
]
]
probe results
Again, not perfect in terms of elegance and similar in approach.

Related

what does "(.+)"}], in lua pattern matching mean

i got this code that take webhook message but i dont understand the pattern matching behind it
function getContent(link)
cmd = io.popen('powershell -command "[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12; Write-Host (Invoke-WebRequest -Uri "'..link..'").Content"')
return cmd:read("*all"):match('"description": "(.+)"}],')
end
function blabla(a)
link= webhookLink.."/messages/"..messageId
text = [[
$webHookUrl = "]]..link..[["
$embedObject = #{
description = "]]..getContent(link):gsub("\\([nt])", {n="\n", t="\t"})..[[`n]]..a..[["
}
$embedArray = #($embedObject)
$payload = #{
embeds = $embedArray
}
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
Invoke-RestMethod -Uri $webHookUrl -Body ($payload | ConvertTo-Json -Depth 4) -Method Patch -ContentType 'application/json'
]]
file = io.popen("powershell -command -", "w")
file:write(text)
file:close()
end
My question is what does "(.+)"}] mean in the getContent function and ]]..getContent(link):gsub("\\([nt])", {n="\n", t="\t"})..[[n]]..a..[[inblabla` function ?
https://gitspartv.github.io/lua-patterns/
This can help you greatly with pattern matching,
though, (.+) means capture any digit, with multiple repetations.
example: "ABC(.+)" would return everything after "ABC"
return cmd:read("*all"):match('"description": "(.+)"}],')
Looks like this gets everything inside description: "THIS_IS_RETURNED"],
Let's break up the pattern: "description": "(.+)"}],
"description": " - literal string that needs to match
(.+) - one or more of any character, greedy (+), captured
"}], - another literal string
That is, this pattern will match the first substring that starts with "description": ", followed by one or more characters which are captured, followed by "}],.
All in all, this pattern is a very unreliable implementation of extracting a certain string from what's probably JSON; you should use a proper JSON parser instead. This pattern will fail in all of the following cases:
Empty String: [[{"description": ""}],null] (might be intended)
Another String later on: [{"foo": [{"description": "bar"}], "baz": "world"}], null], which would match as bar"}], "baz": "world due to + doing greedy matching.
. means any character, and + means one or more. I suggest you to read up on the basics of patterns. For example: http://lua-users.org/wiki/PatternsTutorial

Adding data-pos info to latex output

As described here, Pandoc records Synctex-like information when the source is commonmark+sourcepos. For example, with this commonmark input,
---
title: "Sample"
---
This is a sample document.
the output in native format starts like this:
Pandoc
Meta
{ unMeta =
fromList [ ( "title" , MetaInlines [ Str "Sample" ] ) ]
}
[ Div
( "" , [] , [ ( "data-pos" , "Sample.knit.md#5:1-6:1" ) ] )
[ Para
[ Span
( ""
, []
, [ ( "data-pos" , "Sample.knit.md#5:1-5:5" ) ]
)
[ Str "This" ]
, Span
( ""
, []
, [ ( "data-pos" , "Sample.knit.md#5:5-5:6" ) ]
)
[ Space ]
, Span
( ""
, []
, [ ( "data-pos" , "Sample.knit.md#5:6-5:8" ) ]
)
[ Str "is" ]
but all that appears in the .tex file is this:
{This}{ }{is}...
As a step towards Synctex support, I'd like to insert the data-pos information as LaTeX markup, i.e. change the .tex output to look like this:
{This\datapos{Sample.knit.md#5:1-5:5}}{ \datapos{Sample.knit.md#5:5-5:6}}{is\datapos{Sample.knit.md#5:6-5:8}}...
This looks like something a Lua filter could accomplish pretty easily: look for the data-pos records, copy the location information into the Str record. However, I don't know Lua or Pandoc native language. Could someone help with this? Doing it for the Span records would be enough for my purposes. I'm using Pandoc 2.18 and Lua 5.4.
Here is an attempt that appears to work. Comments or corrections would still be welcome!
Span = function(span)
local datapos = span.attributes['data-pos']
if datapos then
table.insert(span.content, pandoc.RawInline('tex', "\\datapos{" .. datapos .. "}"))
end
return span
end

Transform request data in Krakrnd with lua

Using Krakend as api gateway.
I have an endpoint configured in krakend.json:
"endpoint":"/call",
"extra_config":{
"github.com/devopsfaith/krakend-lua/proxy":{
"sources":[
"/function.lua"
],
"pre":"pre_backend(request.load())",
"live":true,
"allow_open_libs":true
}
},
"method":"POST",
"output_encoding":"json",
"headers_to_pass":[
"*"
],
"backend":[
{
"url_pattern":"/api/v1/get_client_id",
[...]]
},
The endopont "/api/v1/get_client_id" recives just a param:
{"user_mail_1":"test#test.es"}
I want, whith the lua script my endopoint "/call" recives:
{"email":"test#test.es"}
and transform on before send:
{"user_mail_1":"test#test.es"}
I tried with gsub, but use body() as "string" is no efficient.
function pre_backend( req )
print('--Backend response, pre-logic:');
local r = req;
r:params('test','test');
r:query('lovelyquery')
r:body('test','test');
lolcal v = r:body():gsub('email', 'user_mail_1')
...
Is a way to parse "req" as a table, dict or something i can transform data?
Is another way to transform REQUEST data?
EXAMPLE WORKING WITH GSUB:
function pre_backend( req )
print('--Backend response, pre-logic:');
print('--req');
print(req);
print(type(req));
local r = req;
print('--body');
print(type(r:body()));
print(r:body())
local body_transformed = r:body():gsub('email', 'user_mail_1');
print('--body_transformed');
print(body_transformed);
print(type(body_transformed));
end
Console output:
2022/02/11 09:59:52 DEBUG: [http-server-handler: no extra config]
--Backend response, pre-logic:
--req
userdata: 0xc0004f9b60
userdata
--body
string
{"email" : "test#test.es","test_field":"email"}
--body_transformed
{"user_mail_1" : "test#test.es","test_field":"user_mail_1"}
string
As we can see the gsub is not efficient becouse replace all strings.
If I can work with req as table, dict or something similar, I can replace dict key/value. ex: req['xxx] = 'xxx' or iterate req.keys
gsub stands for global substitution. It replaces all occurances of the pattern in the string.
If you just want to replace "email" infront of an email address simply use a pattern that takes this into account.
print((r:body():gsub('(")(email)("%s-:%s-"%w+#%w+%.%w+")', "%1user_mail_1%3")))
Alternatively if you knwo that you only want to replace the first occurance of email you can simply do this:
print((r:body():gsub("email", "user_mail_1", 1)))
The thrid parameter will stop gsub after the first replacement.

Combining lists into tuples in list comprehension

A = [ [1,2,3],[4,5,6]].
B = [ [a,b,c],[d,e,f]].
The output should be:
[ [{1,a},{2,b},{3,c}],[{4,d},{5,e},{6,f}]].
This is what I have got so far.
Input: [ [{Y} || Y<-X ] || X<-A].
Output: [[{1},{2},{3}],[{4},{5},{6}]]
I think this is what you need:
[lists:zip(LA, LB) || {LA, LB} <- lists:zip(A, B)].
You need to zip both lists to be able to work with their elements together.

Capitalize first letter of every word in Lua

I'm able to capitalize the first letter of my string using:
str:gsub("^%l", string.upper)
How can I modify this to capitalize the first letter of every word in the string?
I wasn't able to find any fancy way to do it.
str = "here you have a long list of words"
str = str:gsub("(%l)(%w*)", function(a,b) return string.upper(a)..b end)
print(str)
This code output is Here You Have A Long List Of Words. %w* could be changed to %w+ to not replace words of one letter.
Fancier solution:
str = string.gsub(" "..str, "%W%l", string.upper):sub(2)
It's impossible to make a real single-regex replace because lua's pattern system is simple.
in the alternative answer listed you get inconsistent results with words containing apostrophes:
str = string.gsub(" "..str, "%W%l", string.upper):sub(2)
will capitalize the first letter after each apostrophe irregardless if its the first letter in the word
eg: "here's a long list of words" outputs "Here'S A Long List Of Words"
to fix this i found a clever solution here
utilizing this code:
function titleCase( first, rest )
return first:upper()..rest:lower()
end
string.gsub(str, "(%a)([%w_']*)", titleCase)
will fix any issues caused by that weird bug
function titleCase( first, rest )
return first:upper()..rest:lower()
end
string.gsub(str, "(%a)([%w_']*)", titleCase)
BunchOfText {"Yeppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppp"}
I have a feeling I will be returning to this question when I need to put something in proper title case.
Below is the Lua code to do exactly that.
It has the disadvantage of not preserving the original spacing between words but it's good enough for now.
-- Lua is like python in syntax, and barebones like C -_-
function Set (list)
local set = {}
for _, l in ipairs(list) do set[l] = true end
return set
end
function firstToUpper(str)
return (str:gsub("^%l", string.upper))
end
function titlecase(str)
-- We need to break the string into pieces
words = {}
for word in string.gmatch(str, '([^%s]+)') do
table.insert(words, word)
end
-- We need to capitalize anything that is not a:
-- - Article
-- - Coordinating Conjunction
-- - Preposition
-- Thus we have a blacklist of such words
local blacklist = Set {
"at", "but", "by", "down", "for", "from",
"in", "into", "like", "near", "of", "off",
"on", "onto", "out", "over", "past", "plus",
"to", "up", "upon", "with", "nor", "yet",
"so", "the"
}
for index, word in pairs(words) do
if(not (blacklist[word] ~= nil)) then
words[index] = firstToUpper(word)
end
end
-- First and last words are always capitalized
words[1] = firstToUpper(words[1])
words[#words] = firstToUpper(words[#words])
-- Concat elements in list via space character
local result = ""
for index, word in pairs(words) do
result = result .. word
if(index ~= #words) then
result = result .. ' '
end
end
return result
end
print(titlecase("the world"))
print(titlecase("I walked my dog this morning ..."))
print(titlecase("The art of Lua"))
--- Output:
----------------------
--- The World
--- I Walked My Dog This Morning ...
--- The Art of Lua

Resources