The grep implementation from 'The Practice of Programming' - grep

I was going through the grep.c file (at https://www.cs.princeton.edu/~bwk/tpop.webpage/grep.c) from Kernighan & Pike's book The Practice of Programming.
If I search for regexp
^c*
in a file which contains nothing but
d
it will send these to match() function which will send ("c*", "d") to matchhere function.
matchhere sends ('c', "\0", "d") to matchstar which in turn sends ("\0", "d") to matchhere function. This will return 1 to the main grep() function
Where am I taking the values incorrectly?

Returning 1 is the correct answer. The regular expression ^c* matches zero or more copies of c at the beginning of a line, and a file containing d certainly matches that; in fact, every line of every file will match that.

Related

Trying to get some kind of key:value data from a string in Lua

I'm (again) stuck because patterns... so let's see if with a little of help... The case is I have e. g. a string returned by a function that contains the following:
📄 My Script
ScriptID:RL_SimpleTest
Version:0.0.1
ScriptType:MenuScript
AnotherKey:AnotherValue
And, maybe, some more text...
And I'd want to parse it line by line and should the line contains a ":" get the left side content of the line in a variable (k) and the right content in another one (v), so e. g. I'd have k containing "ScriptID" and v containing "RL_SimpleTest" for the second line (the first one should be just ignored) and so on...
Well, I've started with something like this:
function RL_Test:StringToKeyValue(str, sep1, sep2)
sep1 = sep1 or "\n"
sep2 = sep2 or ":"
local t = {}
for line in string.gmatch(str, "([^" .. sep1 .. "]+)") do
print(line)
for k in string.gmatch(line, "([^" .. sep2 .. "]+)") do --Here is where I'm lost trying to get the key/value pair separately and at the same time...
--t[k] = v
print(k)
end
end
return t
end
With the hope once I got isolated the line containing the data in the key:value form that I want to extract, I'd be able to do some kind of for k, v in string.gmatch(line, "([^" .. sep2 .. "]+)") or something so and that way get the two pieces of data, but of course it doesn't work and even though I have a feeling it's a triviality I don't know even where to start, always for the lack of patterns understanding...
Well, I hope at least I exposed it right... Thanks in advance for any help.
local t = {}
for line in (s..'\n'):gmatch("(.-)\r?\n") do
for a, b in line:gmatch("([^:]+):([^:\n\r]+)") do
t[a] = b
end
end
The pattern is quite simple. Match anything that is not a colon that is followed by a colon that is followed by anything that is not a colon or a line break. Put what you want in captures and you're done.
I assume every line is of the format k:v, containing exactly one colon, or containing no colon (no k/v pair).
Then you can simply first match nonempty lines using [^\n]+ (assuming UNIX LF line endings), then match each line using ^([^:]+):([^:]+)$. Breakdown of the second pattern:
^ and $ are anchors. They force the pattern to match the entire line.
([^:]+) matches & captures one or more non-semicolon characters.
This leaves you with:
function RL_Test:StringToKeyValue(str)
local t = {}
for line in str:gmatch"[^\n]+" do
local k, v = line:match"^([^:]+):([^:]+)$"
if k then -- line is k:v pair?
t[k] = v
end
end
return t
end
If you want to support Windows CRLF line endings, use for line in (s..'\n'):gmatch'(.-)\r?\n' do as in Piglet's answer for matching the lines instead.
This answer differs from Piglet's answer in that it uses match instead of gmatch for matching the k/v pairs, allowing exactly one k/v pair with exactly one colon per line, whereas Piglet's code may extract multiple k/v pairs per line.

How do I get a list of strings that are between 2 strings in Lua?

I'm trying to write a HTML parser in Lua. A major roadblock I hit almost immediately was I have no idea how to get a list of strings between 2 strings. This is important for parsing HTML, where a tag is defined by being within 2 characters (or strings of length 1), namely '<' and '>'. I am aware of this answer, but it only gets the first occurence, not all instances of a string between the 2 given strings.
What I mean by "list of strings between 2 strings" is something like this:
someFunc("<a b c> <c b a> a </c b a> </a b c>", "<", ">")
Returns:
{"a b c", "c b a", "/c b a", "/a b c"}
This does not have to parse newlines nor text in between tags, as both of those can be handled using extra logic. However, I would prefer it if it did parse newlines, so I can run the code once for the whole string returned by the first GET request.
Note: This is a experiment project to see if this is possible in the very limited Lua environment provided by the CC: Tweaked mod for Minecraft. Documentation here: https://tweaked.cc/
You can do simply:
local list = {}
local html = [[bla]]
for innerTag in html:gmatch("<(.-)>") do
list[#list] = innerTag
end
But be aware that it is weak as doesn't validade wrong things, as someone can put an < or > inside a string etc.

Lua Pattern matching only returning first match

I can't figure out how to get Lua to return ALL matches for a particular pattern match.
I have the following regex which works and is so basic:
.*\n
This just splits a long string per line.
The equivelent of this in Lua is:
.-\n
If you run the above in a regex website against the following text it will find three matches (if using the global flag).
Hello
my name is
Someone
If you do not use the global flag it will return only the first match. This is the behaviour of LUA; it's as if it does not have a global switch and will only ever return the first match.
The exact code I have is:
local test = {string.match(string_variable_here, ".-\n")}
If I run it on the above test for example, test will be a table with only one item (the first row). I even tried using capture groups but the result is the same.
I cannot find a way to make it return all occurrences of a match, does anyone know if this is possible in LUA?
Thanks,
You can use string.gmatch(s, pattern) / s:gmatch(pattern):
This returns a pattern finding iterator. The iterator will search through the string passed looking for instances of the pattern you passed.
See the online Lua demo:
local a = "Hello\nmy name is\nSomeone\n"
for i in string.gmatch(a, ".*\n") do
print(i)
end
Note that .*\n regex is equivalent to .*\n Lua pattern. - in Lua patterns is the equivalent of *? non-greedy ("lazy") quantifier.

How to replace some characters of input file, before it getting lexed in flex?

How to replace all occurrences of some character or char-sequence with some other character or char-sequence, before flex lexes it. For example I want B\65R to match identifier rule as it is equivalent to BAR in my grammar. So, essentially I want to turn a sequence of \dd into its equivalent ascii character and then lex it. (\65 -> A, \66 -> B, …).
I know, I can first search the entire file for a sequence of \dd and replace it with equivalent character and then feed it to flex. But I wonder if there exists a better way. Something like writing a rule that matches \dd and then replacing it with corresponding alternative in the input stream, so that, I don't have to parse entire file twice.
Several options...
Next, flex is going to read from a filter that
substitutes "\dd" by "chr(dd)" (untested).
You could run something along the lines of
YYIN = popen("perl -pe 's/\\(\d\d)/chr($1)/e' ", "r");
yylex()....

lua string.match does't match as expected(different with other language)

a = "stackoverflow.com/questions/ask"
print(string.match(a,"(.*/)")) -- stackoverflow.com/questions/
print(string.match(a,"(.*/).*")) -- stackoverflow.com/questions/
I can't understand the second result. In my option it should be "stackoverflow.com/questions/ask" as "(.*/)" matches "stackoverflow.com/questions/" and ".*" matches "ask". Can someone tell me WHY the second result is "stackoverflow.com/questions/" ? Does x = string.match(a,"(.*/).*") and x = string.match(a,"(.*/)") are same?
the () means you have used Captures.so maybe you can use it like this:
print(string.match(a,"((.*/).*)"))
Captures:
A pattern can contain sub-patterns enclosed in parentheses; they describe captures. When a match succeeds, the substrings of the subject string that match captures are stored (captured) for future use. Captures are numbered according to their left parentheses. For instance, in the pattern "(a*(.)%w(%s*))", the part of the string matching "a*(.)%w(%s*)" is stored as the first capture (and therefore has number 1); the character matching "." is captured with number 2, and the part matching "%s*" has number 3.

Resources