Split a string on new lines, but include empty lines - lua

Let's say I have a string with the contents
local my_str = [[
line1
line2
line4
]]
I'd like to get the following table:
{"line1","line2","","line4"}
In other words, I'd like the blank line 3 to be included in my result. I've tried the following:
local result = {};
for line in string.gmatch(my_str, "[^\n]+") do
table.insert(result, line);
end
However, this produces a result which will not include the blank line 3.
How can I make sure the blank line is included? Am I just using the wrong regex?

Try this instead:
local result = {};
for line in string.gmatch(my_str .. "\n", "(.-)\n") do
table.insert(result, line);
end
If you don't want the empty fifth element that gives you, then get rid of the blank line at the end of my_str, like this:
local my_str = [[
line1
line2
line4]]
(Note that a newline at the beginning of a long literal is ignored, but a newline at the end is not.)

You can replace the + with *, but that won't work in all Lua versions; LuaJIT will add random empty strings to your result (which isn't even technically wrong).
If your string always includes a newline character at the end of the last line like in your example, you can just do something like "([^\n]*)\n" to prevent random empty strings and the last empty string.
In Lua 5.2+ you can also just use a frontier pattern to check for either a newline or the end of the string: [^\n]*%f[\n\0], but that won't work in LuaJIT either.
If you need to support LuaJIT and don't have the trailing newline in your actual string, then you could just add it manually:
string.gmatch(my_str .. "\n", "([^\n]*)\n")

Related

How to remove last line from a string in Lua?

I am using Lua in World of Warcraft.
I have this string:
"This\nis\nmy\nlife."
So when printed, the output is this:
This
is
my
life.
How can I store the entire string except the last line in a new variable?
So I want the output of the new variable to be this:
This
is
my
I want the Lua code to find the last line (regardless of how many lines in the string), remove the last line and store the remaining lines in a new variable.
Thank you.
So I found that Egor Skriptunoff's solutions in the comments worked very well indeed but I am unable to mark his comments as an answer so I'll put his answers here.
This removes the last line and stores the remaining lines in a new variable:
new_str = old_str:gsub("\n[^\n]*$", "")
If there is a new line marker at the end of the last line, Egor posted this as a solution:
new_str = old_str:gsub("\n[^\n]*(\n?)$", "%1")
While this removes the first line and stores the remaining lines in a new variable:
first_line = old_str:match("[^\n]*")
Thanks for your help, Egor.
Most efficient solution is plain string.find.
local s = "This\nis\nmy\nlife." -- string with newlines
local s1 = "Thisismylife." -- string without newlines
local function RemoveLastLine(str)
local pos = 0 -- start position
while true do -- loop for searching newlines
local nl = string.find(str, "\n", pos, true) -- find next newline, true indicates we use plain search, this speeds up on LuaJIT.
if not nl then break end -- We didn't find any newline or no newlines left.
pos = nl + 1 -- Save newline position, + 1 is necessary to avoid infinite loop of scanning the same newline, so we search for newlines __after__ this character
end
if pos == 0 then return str end -- If didn't find any newline, return original string
return string.sub(str, 1, pos - 2) -- Return substring from the beginning of the string up to last newline (- 2 returns new string without the last newline itself
end
print(RemoveLastLine(s))
print(RemoveLastLine(s1))
Keep in mind this works only for strings with \n-style newlines, if you have \n\r or \r\n easier solution would be a pattern.
This solution is efficient for LuaJIT and for long strings.
For small strings string.sub(s1, 1, string.find(s1,"\n[^\n]*$") - 1) is fine (Not on LuaJIT tho).
I scan it backward because it more easier to remove thing from back with backward scanning rather than forward it would be more complex if you scan forward and much simpler scanning backward
I succeed it in one take
function removeLastLine(str) --It will return empty string when there just 1 line
local letters = {}
for let in string.gmatch(str, ".") do --Extract letter by letter to a table
table.insert(letters, let)
end
local i = #letters --We're scanning backward
while i >= 0 do --Scan from bacward
if letters[i] == "\n" then
letters[i] = nil
break
end
letters[i] = nil --Remove letter from letters table
i = i - 1
end
return table.concat(letters)
end
print("This\nis\nmy\nlife.")
print(removeLastLine("This\nis\nmy\nlife."))
How the code work
The letters in str argument will be extracted to a table ("Hello" will become {"H", "e", "l", "l", "o"})
i local is set to the end of the table because we scan it from the back to front
Check if letters[i] is \n if it newline then goto step 7
Remove entry at letters[i]
Minus i with 1
Goto step 3 until i is zero if i is zero then goto step 8
Remove entry at letters[i] because it havent removed when checking for newline
Return table.concat(letters). Won't cause error because table.concat return empty string if the table is empty
#! /usr/bin/env lua
local serif = "Is this the\nreal life?\nIs this\njust fantasy?"
local reversed = serif :reverse() -- flip it
local pos = reversed :find( '\n' ) +1 -- count backwards
local sans_serif = serif :sub( 1, -pos ) -- strip it
print( sans_serif )
you can oneline it if you want, same results.
local str = "Is this the\nreal life?\nIs this\njust fantasy?"
print( str :sub( 1, -str :reverse() :find( '\n' ) -1 ) )
Is this the
real life?
Is this

Splitting pattern into multiple tables with gmatch

I am trying to gsplit my text into multiple tables using a pattern.
So this is my input.
\x10Hello\x0AWorld
This is what I expect in my output,
\x0A <- similar inputs will always be 4 chars long
{{'\x10', 'Hello'}, {'\x0A', 'World'}}
This is what I have tried so far.
local function splitIntoTable(input)
local output = {}
for code, text in (input):gmatch('(\\x%x+)(.*)') do
print(code .. ' ' .. text);
table.insert(output, { code, text })
end
return output
end
I made 2 regex groups in gmatch the first group is for the hex and the second group is for the text, I am not sure why this isn't working. The print statement never gets executed so the loop is never being used.
The pattern '\\x%x+' matches a literal backslash, an x, and a sequence of hex digits. It does not match the ASCII character generated by a hexadecimal escape such as '\x0A'.
You need to replace it with a character class in square brackets such as '[\x10\x0A]'. You will have to fill in the character class with whatever ASCII characters (or other bytes) you are expecting in that position in the match.
Unfortunately, this pattern will only match once in a string like '\x10Hello\x0AWorld'. The second part of the pattern also needs to be modified.
local function splitIntoTable(input)
local output = {}
for code, text in (input):gmatch('([\x10\x0A])(.*)') do
print(code .. ' ' .. text);
table.insert(output, { code, text })
end
return output
end

extract data from string in lua - SubStrings and Numbers

I'm trying to phrase a string for a hobby project and I'm self taught from code snips from this site and having a hard time working out this problem. I hope you guys can help.
I have a large string, containing many lines, and each line has a certain format.
I can get each line in the string using this code...
for line in string.gmatch(deckData,'[^\r\n]+') do
print(line) end
Each line looks something like this...
3x Rivendell Minstrel (The Hunt for Gollum)
What I am trying to do is make a table that looks something like this for the above line.
table = {}
table['The Hunt for Gollum'].card = 'Rivendell Minstrel'
table['The Hunt for Gollum'].count = 3
So my thinking was to extract everything inside the parentheses, then extract the numeric vale. Then delete the first 4 chars in the line, as it will always be '1x ', '2x ' or '3x '
I have tried a bunch of things.. like this...
word=str:match("%((%a+)%)")
but it errors if there are spaces...
my test code looks like this at the moment...
line = '3x Rivendell Minstrel (The Hunt for Gollum)'
num = line:gsub('%D+', '')
print(num) -- Prints "3"
card2Fetch = string.sub(line, 5)
print(card2Fetch) -- Prints "Rivendell Minstrel (The Hunt for Gollum)"
key = string.gsub(card2Fetch, "%s+", "") -- Remove all Spaces
key=key:match("%((%a+)%)") -- Fetch between ()s
print(key) -- Prints "TheHuntforGollum"
Any ideas how to get the "The Hunt for Gollum" text out of there including the spaces?
Try a single pattern capturing all fields:
x,y,z=line:match("(%d+)x%s+(.-)%s+%((.*)%)")
t = {}
t[z] = {}
t[z].card = y
t[z].count = x
The pattern reads: capture a run of digits before x, skip whitespace, capture everything until whitespace followed by open parenthesis, and finally capture everything until a close parenthesis.

Leading and trailing spaces for each line from textarea

ruby 2.1.3
rails 4.1.7
I want to generate a unordered list from textarea. So I have to preserve all line breaks for each item and remove leading and trailing spaces.
Well, I'm trying to remove all leading and trailing spaces from each line of textarea with no success.
I'm using a regex:
string_from_textarea.gsub(/^[ \t]+|[ \t]+$/, '')
I've tried strip and rstrip rails methods with no luck too (they are working with the same result as regex):
Leading spaces for each line are removed perfectly.
But with trailing spaces only the last space from string is removed. But I wanna for each line.
What am I missing here? What is the deal with textarea and trailing spaces for each line?
UPDATE
Some code example:
I'm using a callback to save formated data.
after_validation: format_ingredients
def format_ingredients
self.ingredients = #ingredients.gsub(/^[ \t]+|[ \t]+$/, "")
end
Form view:
= f.text_area :ingredients, class: 'fieldW-600 striped', rows: '10'
You can use String#strip
' test text with multiple spaces '.strip
#=> "test text with multiple spaces"
To apply this to each line:
str = " test \ntext with multiple \nspaces "
str = str.lines.map(&:strip).join("\n")
"test\ntext with multiple\nspaces"
This isn't a good use for a regexp. Instead use standard String processing methods.
If you have text that contains embedded LF ("\n") line-ends and spaces at the beginning and ends of the lines, then try this:
foo = "
line 1
line 2
line 3
"
foo # => "\n line 1 \n line 2\nline 3\n"
Here's how to clean the lines of leading/trailing white-space and re-add the line-ends:
bar = foo.each_line.map(&:strip).join("\n")
bar # => "\nline 1\nline 2\nline 3"
If you're dealing with CRLF line-ends, as a Windows system would generate text:
foo = "\r\n line 1 \r\n line 2\r\nline 3\r\n"
bar = foo.each_line.map(&:strip).join("\r\n")
bar # => "\r\nline 1\r\nline 2\r\nline 3"
If you're dealing with the potential of having white-space that contains other forms of white-space like non-breaking spaces, then switching to a regexp that uses the POSIX [[:space:]] character set, that contains white-space used in all character sets. I'd do something like:
s.sub(/^[[:space:]]+/, '').sub(/[[:space:]]+$/, '')
I think #sin probably intimated the problem in his/her first comment. Your file was probably produced on a Windows machine that puts a carriage return/life feed pair ("\r\n") at the end of each line other than (presumably) the last, where it just writes \n. (Check line[-2] on any line other than the last.) That would account for the result you are getting:
r = /^[ \t]+|[ \t]+$/
str = " testing 123 \r\n testing again \n"
str.gsub(r, '')
#=> "testing 123 \r\ntesting again\n"
If this theory is correct the fix should be just a slight tweak to your regex:
r = /^[ \t]+|[ \t\r]+$/
str.gsub(r, '')
#=> "testing 123\ntesting again\n"
You might be able to do this with your regex by changing the value of the global variable $/, which is the input record separator, a newline by default. That could be a problem for the end of the last line, however, if that only has a newline.
I think you might be looking for String#lstrip and String#rstrip methods:
str = %Q^this is a line
and so is this
all of the lines have two spaces at the beginning
and also at the end ^`
`> new_string = ""
> ""
str.each_line do |line|
new_string += line.rstrip.lstrip + "\n"
end
> "this is a line\n and so is this \n all of the lines have two spaces at the beginning \n and also at the end "
2.1.2 :034 > puts new_string
this is a line
and so is this
all of the lines have two spaces at the beginning
and also at the end
> new_string
`> "this is a line\nand so is this\nall of the lines have two spaces at the beginning\nand also at the end\n"`

How can I replace new lines with \n character

Data stored in the database is like this:
This is a line
This is another line
How about this line
When I output it to the view, I want to convert that to:
This is a line\n\nThis is another line\n\nHow about this line
with no new lines and the actual \n characters printed out. How can I do that?
> s = "hi\nthere"
> puts s
hi
there
> puts s.gsub(/\n/, "\\n")
hi\nthere
I would personally use gsub if you only want newlines specifically converted. However, if you want to generally inspect the contents of the string, do this:
str = "This is a line\n\nThis is another line\n\nHow about this line"
puts str.inspect[1..-2]
#=> This is a line\n\nThis is another line\n\nHow about this line
The String#inspect method escapes various 'control' characters in your string. It also wraps the string with ", which I've stripped off above. Note that this may produce undesirable results, e.g. the string My name is "Phrogz" will come out as My name is \"Phrogz\".
> s = "line 1\n\nline2"
=> "line 1\n\nline2"
> puts s
line 1
line2
> puts s.gsub("\n", "\\n")
line 1\n\nline2
The key is to escape the single backslash.

Resources