Lua String Split - lua

Hi I've got this function in JavaScript:
function blur(data) {
var trimdata = trim(data);
var dataSplit = trimdata.split(" ");
var lastWord = dataSplit.pop();
var toBlur = dataSplit.join(" ");
}
What this does is it take's a string such as "Hello my name is bob" and will return
toBlur = "Hello my name is" and lastWord = "bob"
Is there a way i can re-write this in Lua?

You could use Lua's pattern matching facilities:
function blur(data) do
return string.match(data, "^(.*)[ ][^ ]*$")
end
How does the pattern work?
^ # start matching at the beginning of the string
( # open a capturing group ... what is matched inside will be returned
.* # as many arbitrary characters as possible
) # end of capturing group
[ ] # a single literal space (you could omit the square brackets, but I think
# they increase readability
[^ ] # match anything BUT literal spaces... as many as possible
$ # marks the end of the input string
So [ ][^ ]*$ has to match the last word and the preceding space. Therefore, (.*) will return everything in front of it.
For a more direct translation of your JavaScript, first note that there is no split function in Lua. There is table.concat though, which works like join. Since you have to do the splitting manually, you'll probably use a pattern again:
function blur(data) do
local words = {}
for m in string.gmatch("[^ ]+") do
words[#words+1] = m
end
words[#words] = nil -- pops the last word
return table.concat(words, " ")
end
gmatch does not give you a table right away, but an iterator over all matches instead. So you add them to your own temporary table, and call concat on that. words[#words+1] = ... is a Lua idiom to append an element to the end of an array.

Related

Implement heredocs with trim indent using PEG.js

I working on a language similar to ruby called gaiman and I'm using PEG.js to generate the parser.
Do you know if there is a way to implement heredocs with proper indentation?
xxx = <<<END
hello
world
END
the output should be:
"hello
world"
I need this because this code doesn't look very nice:
def foo(arg) {
if arg == "here" then
return <<<END
xxx
xxx
END
end
end
this is a function where the user wants to return:
"xxx
xxx"
I would prefer the code to look like this:
def foo(arg) {
if arg == "here" then
return <<<END
xxx
xxx
END
end
end
If I trim all the lines user will not be able to use a string with leading spaces when he wants. Does anyone know if PEG.js allows this?
I don't have any code yet for heredocs, just want to be sure if something that I want is possible.
EDIT:
So I've tried to implement heredocs and the problem is that PEG doesn't allow back-references.
heredoc = "<<<" marker:[\w]+ "\n" text:[\s\S]+ marker {
return text.join('');
}
It says that the marker is not defined. As for trimming I think I can use location() function
I don't think that's a reasonable expectation for a parser generator; few if any would be equal to the challenge.
For a start, recognising the here-string syntax is inherently context-sensitive, since the end-delimiter must be a precise copy of the delimiter provided after the <<< token. So you would need a custom lexical analyser, and that means that you need a parser generator which allows you to use a custom lexical analyser. (So a parser generator which assumes you want a scannerless parser might not be the optimal choice.)
Recognising the end of the here-string token shouldn't be too difficult, although you can't do it with a single regular expression. My approach would be to use a custom scanning function which breaks the here-string into a series of lines, concatenating them as it goes until it reaches a line containing only the end-delimiter.
Once you've recognised the text of the literal, all you need to normalise the spaces in the way you want is the column number at which the <<< starts. With that, you can trim each line in the string literal. So you only need a lexical scanner which accurately reports token position. Trimming wouldn't normally be done inside the generated lexical scanner; rather, it would be the associated semantic action. (Equally, it could be a semantic action in the grammar. But it's always going to be code that you write.)
When you trim the literal, you'll need to deal with the cases in which it is impossible, because the user has not respected the indentation requirement. And you'll need to do something with tab characters; getting those right probably means that you'll want a lexical scanner which computes visible column positions rather than character offsets.
I don't know if peg.js corresponds with those requirements, since I don't use it. (I did look at the documentation, and failed to see any indication as to how you might incorporate a custom scanner function. But that doesn't mean there isn't a way to do it.) I hope that the discussion above at least lets you check the detailed documentation for the parser generator you want to use, and otherwise find a different parser generator which will work for you in this use case.
Here is the implementation of heredocs in Peggy successor to PEG.js that is not maintained anymore. This code was based on the GitHub issue.
heredoc = "<<<" begin:marker "\n" text:($any_char+ "\n")+ _ end:marker (
&{ return begin === end; }
/ '' { error(`Expected matched marker "${begin}", but marker "${end}" was found`); }
) {
const loc = location();
const min = loc.start.column - 1;
const re = new RegExp(`\\s{${min}}`);
return text.map(line => {
return line[0].replace(re, '');
}).join('\n');
}
any_char = (!"\n" .)
marker_char = (!" " !"\n" .)
marker "Marker" = $marker_char+
_ "whitespace"
= [ \t\n\r]* { return []; }
EDIT: above didn't work with another piece of code after heredoc, here is better grammar:
{ let heredoc_begin = null; }
heredoc = "<<<" beginMarker "\n" text:content endMarker {
const loc = location();
const min = loc.start.column - 1;
const re = new RegExp(`^\\s{${min}}`, 'mg');
return {
type: 'Literal',
value: text.replace(re, '')
};
}
__ = (!"\n" !" " .)
marker 'Marker' = $__+
beginMarker = m:marker { heredoc_begin = m; }
endMarker = "\n" " "* end:marker &{ return heredoc_begin === end; }
content = $(!endMarker .)*

Lua unusual variable name (question mark variable)

I have stumbled upon this line of code and I am not sure what the [ ? ] part represents (my guess is it's a sort of a wildcard but I searched it for a while and couldn't find anything):
['?'] = function() return is_canadian and "eh" or "" end
I understand that RHS is a functional ternary operator. I am curious about the LHS and what it actually is.
Edit: reference (2nd example):
http://lua-users.org/wiki/SwitchStatement
Actually, it is quite simple.
local t = {
a = "aah",
b = "bee",
c = "see",
It maps each letter to a sound pronunciation. Here, a need to be pronounced aah and b need to be pronounced bee and so on. Some letters have a different pronunciation if in american english or canadian english. So not every letter can be mapped to a single sound.
z = function() return is_canadian and "zed" or "zee" end,
['?'] = function() return is_canadian and "eh" or "" end
In the mapping, the letter z and the letter ? have a different prononciation in american english or canadian english. When the program will try to get the prononciation of '?', it will calls a function to check whether the user want to use canadian english or another english and the function will returns either zed or zee.
Finally, the 2 following notations have the same meaning:
local t1 = {
a = "aah",
b = "bee",
["?"] = "bee"
}
local t2 = {
["a"] = "aah",
["b"] = "bee",
["?"] = "bee"
}
If you look closely at the code linked in the question, you'll see that this line is part of a table constructor (the part inside {}). It is not a full statement on its own. As mentioned in the comments, it would be a syntax error outside of a table constructor. ['?'] is simply a string key.
The other posts alreay explained what that code does, so let me explain why it needs to be written that way.
['?'] = function() return is_canadian and "eh" or "" end is embedded in {}
It is part of a table constructor and assigns a function value to the string key '?'
local tbl = {a = 1} is syntactic sugar for local tbl = {['a'] = 1} or
local tbl = {}
tbl['a'] = 1
String keys that allow that convenient syntax must follow Lua's lexical conventions and hence may only contain letters, digits and underscore. They must not start with a digit.
So local a = {? = 1} is not possible. It will cause a syntax error unexpected symbol near '?' Therefor you have to explicitly provide a string value in square brackets as in local a = {['?'] = 1}
they gave each table element its own line
local a = {
1,
2,
3
}
This greatly improves readability for long table elements or very long tables and allows you maintain a maximum line length.
You'll agree that
local tbl = {
z = function() return is_canadian and "zed" or "zee" end,
['?'] = function() return is_canadian and "eh" or "" end
}
looks a lot cleaner than
local tbl = {z = function() return is_canadian and "zed" or "zee" end,['?'] = function() return is_canadian and "eh" or "" end}

How to extract integer after "=" sign using ruby

I'm trying to extract the integers after mrp= and talktime=.
var i=0;
var recharge=[];
var recharge_text=[];
var recharge_String="";
var mrp="";
var talktime="";
var validity="";
var mode="";mrp='1100';
talktime='1200.00';
validity='NA';
mode='E-Recharge';
if(typeof String.prototype.trim !== 'function') {
String.prototype.trim = function() {
return this.replace(/^ +| +$/g, '');
}
}
mrp=mrp.trim();
if(isNaN(mrp))
{
recharge_text.push({MRP:mrp, Talktime:talktime, Validity:validity ,Mode:mode});
}
else
{
mrp=parseInt(mrp);
recharge.push({MRP:mrp, Talktime:talktime, Validity:validity ,Mode:mode});
}
mrp='2200';
talktime='2400.00';
I've extracted the above text from a webpage, but I do not know how to extract that particular part alone.
You can use regular expressions to parse strings and extract parts of it :
my_text = "blablabla" #just imagine that this is your text
regex_mrp = /mrp='(.+?)';/ #extracts whatever is between single quotes after mrp
regex_talktime = /talktime='(.+?)';/ #extracts whatever is between single quotes after talktime
mrp = my_text.match(regex_mrp)[1].to_i #gets the match, and converts to integer
talktime = my_text.match(regex_talktime)[1].to_f #gets the match, and converts to float
Here's a quick reference to the regular expressions syntax : https://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx
I'd do something like this:
string = <<EOT
var i=0;
var recharge=[];
var recharge_text=[];
var recharge_String="";
var mrp="";
var talktime="";
var validity="";
var mode="";mrp='1100';
talktime='1200.00';
validity='NA';
mode='E-Recharge';
if(typeof String.prototype.trim !== 'function') {
String.prototype.trim = function() {
return this.replace(/^ +| +$/g, '');
}
}
mrp=mrp.trim();
if(isNaN(mrp))
{
recharge_text.push({MRP:mrp, Talktime:talktime, Validity:validity ,Mode:mode});
}
else
{
mrp=parseInt(mrp);
recharge.push({MRP:mrp, Talktime:talktime, Validity:validity ,Mode:mode});
}
mrp='2200';
talktime='2400.00';
EOT
hits = string.scan(/(?:mrp|talktime)='[\d.]+'/)
# => ["mrp='1100'", "talktime='1200.00'", "mrp='2200'", "talktime='2400.00'"]
This gives us an array of hits using scan, where the pattern /(?:mrp|talktime)='[\d.]+'/ matched in the string. Figuring out how the pattern works is left as an exercise for the user, but Ruby's Regexp documentation explains it all.
Cleaning that up to be a bit more useful:
hash = hits.map{ |s|
str, val = s.split('=')
[str, val.delete("'")]
}.each_with_object(Hash.new { |h, k| h[k] = [] }){ |(str, val), h| h[str] << val }
You also need to read about each_with_object and what's happening with Hash.new as those are important concepts to learn in Ruby.
At this point, hash is a hash of arrays:
hash # => {"mrp"=>["1100", "2200"], "talktime"=>["1200.00", "2400.00"]}
You can easily extract a particular variable's values, and can correlate them if need be.
what if i get a string instead of integer next to "=" sign?
...
string.scan(/(?:tariff)='[\p{Print}]+'/)
It's important to understand what the pattern is doing. The regular expression engine has some gotchas that can drastically affect the speed of a search, so indiscriminately throwing in things without understanding what they do can be very costly.
When using (?:...), you're creating a non-capturing group. When you only have one item you're matching it's not necessary, nor is it particularly desirable since it's making the engine do more work. The only time I'd do that is when I need to refer back to what the capture was, but since you have only one possible thing it'll match that becomes a moot-point. So, your pattern should be reduced to:
/tariff='[\p{Print}]+'/
Which, when used, results in:
%(tariff='abcdef abc a').scan(/tariff='[\p{Print}]+'/)
# => ["tariff='abcdef abc a'"]
If you want to capture all non-empty occurrences of the string being assigned, it's easier than what you're doing. I'd use something like:
%(tariff='abcdef abc a').scan(/tariff='.+'/)
# => ["tariff='abcdef abc a'"]
%(tariff='abcdef abc a').scan(/tariff='[^']+'/)
# => ["tariff='abcdef abc a'"]
The second is more rigorous, and possible safer as it won't be tricked by an line that has multiple single-quotes:
%(tariff='abcdef abc a', 'foo').scan(/tariff='.+'/)
# => ["tariff='abcdef abc a', 'foo'"]
%(tariff='abcdef abc a', 'foo').scan(/tariff='[^']+'/)
# => ["tariff='abcdef abc a'"]
Why that works is for you to figure out.

Lua - util_server.lua:440 attempt to index local 'self' (a nil value)

Good evening
Will you help me solve this problem?
ERROR: race/util_server.lua:440: attempt to index local 'self' (a nil value)
function string:split(separator)
if separator == '.' then
separator = '%.'
end
local result = {}
for part in self:gmatch('(.-)' .. separator) do
result[#result+1] = part
end
result[#result+1] = self:match('.*' .. separator .. '(.*)$') or self
return result
end
You're probably calling it wrong.
function string:split(separator)
Is short hand for:
function string.split(self, separator)
Given a string and separator:
s = 'This is a test'
separator = ' '
You need to call it like this:
string.split(s, separator)
Or:
s:split(separator)
If you call it like this:
s.split(separator)
You're failing to provide a value for the self argument.
Side note, you can write split more simply like this:
function string:split(separators)
local result = {}
for part in self:gmatch('[^'..separators..']+') do
result[#result + 1] = part
end
return result
end
This has the disadvantage that you can't used multi-character strings as delimiters, but the advantage that you can specify more than one delimiter. For instance, you could strip all the punctuation from a sentence and grab just the words:
s = 'This is an example: homemade, freshly-baked pies are delicious!'
for _,word in pairs(s:split(' :.,!-')) do
print(word)
end
Output:
This
is
an
example
homemade
freshly
baked
pies
are
delicious

Capitalize first letter of every word in Lua

I'm able to capitalize the first letter of my string using:
str:gsub("^%l", string.upper)
How can I modify this to capitalize the first letter of every word in the string?
I wasn't able to find any fancy way to do it.
str = "here you have a long list of words"
str = str:gsub("(%l)(%w*)", function(a,b) return string.upper(a)..b end)
print(str)
This code output is Here You Have A Long List Of Words. %w* could be changed to %w+ to not replace words of one letter.
Fancier solution:
str = string.gsub(" "..str, "%W%l", string.upper):sub(2)
It's impossible to make a real single-regex replace because lua's pattern system is simple.
in the alternative answer listed you get inconsistent results with words containing apostrophes:
str = string.gsub(" "..str, "%W%l", string.upper):sub(2)
will capitalize the first letter after each apostrophe irregardless if its the first letter in the word
eg: "here's a long list of words" outputs "Here'S A Long List Of Words"
to fix this i found a clever solution here
utilizing this code:
function titleCase( first, rest )
return first:upper()..rest:lower()
end
string.gsub(str, "(%a)([%w_']*)", titleCase)
will fix any issues caused by that weird bug
function titleCase( first, rest )
return first:upper()..rest:lower()
end
string.gsub(str, "(%a)([%w_']*)", titleCase)
BunchOfText {"Yeppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppp"}
I have a feeling I will be returning to this question when I need to put something in proper title case.
Below is the Lua code to do exactly that.
It has the disadvantage of not preserving the original spacing between words but it's good enough for now.
-- Lua is like python in syntax, and barebones like C -_-
function Set (list)
local set = {}
for _, l in ipairs(list) do set[l] = true end
return set
end
function firstToUpper(str)
return (str:gsub("^%l", string.upper))
end
function titlecase(str)
-- We need to break the string into pieces
words = {}
for word in string.gmatch(str, '([^%s]+)') do
table.insert(words, word)
end
-- We need to capitalize anything that is not a:
-- - Article
-- - Coordinating Conjunction
-- - Preposition
-- Thus we have a blacklist of such words
local blacklist = Set {
"at", "but", "by", "down", "for", "from",
"in", "into", "like", "near", "of", "off",
"on", "onto", "out", "over", "past", "plus",
"to", "up", "upon", "with", "nor", "yet",
"so", "the"
}
for index, word in pairs(words) do
if(not (blacklist[word] ~= nil)) then
words[index] = firstToUpper(word)
end
end
-- First and last words are always capitalized
words[1] = firstToUpper(words[1])
words[#words] = firstToUpper(words[#words])
-- Concat elements in list via space character
local result = ""
for index, word in pairs(words) do
result = result .. word
if(index ~= #words) then
result = result .. ' '
end
end
return result
end
print(titlecase("the world"))
print(titlecase("I walked my dog this morning ..."))
print(titlecase("The art of Lua"))
--- Output:
----------------------
--- The World
--- I Walked My Dog This Morning ...
--- The Art of Lua

Resources