Lua string.format and use of newline or control characters - lua

I'm trying to string.format for raw output to the uart using NodeMCU.
I'm trying the function
uart.write(0,string.format("loop %03d local: %02d | gmt %02d:%02d:%02d local %02d/%02d/%04d\n",loops,timezonetime,gmthours,gmtmins,gmtsecs,Nmonth,Nday,Nyear))
but the \n is ignored, and text is concatenated.
print(string.format("loop %03d local: %02d | gmt %02d:%02d:%02d local %02d/%02d/%04d",loops,timezonetime,gmthours,gmtmins,gmtsecs,Nmonth,Nday,Nyear))
works as expected, but I can't control the newline always added by print()
How can I use uart.write and string.format to control the output including the placement and use of newline and other control characters?

The issue a result of newline handling in the LuaLoader that was used for accessing the NodeMCU board. When used with PUTTY, the output is as expected.
Here are the results of more detailed testing. It appears that \r does not work in the string parameter passed to uart.write()
-- uart.write Test
print("______first test____________") -- prime the output with a line and newline
uart.write(0,"asdfasdfasdfasdfasdf") -- no newline
print("______should be at end of same line as asdf...______")
uart.write(0,"asdfasdfasdfasdfasdf(newline)\n") -- with newline
print("______should be on line following asdf...____________")
uart.write(0,"asdfasdfasdfasdfasdf(CR)\r") -- with return only
uart.write(0,"OVERWRITE\n") -- overwrite the first part of asdf line, then newline
print("______should be on newline below OVERWRITE line ____________")
Output results:
dofile("uwtest.lua")
______first test____________
asdfasdfasdfasdfasdf______should be at end of same line as asdf...______
asdfasdfasdfasdfasdf(newline)
______should be on line following asdf...____________
asdfasdfasdfasdfasdf(CR)
OVERWRITE
______should be on newline below OVERWRITE line ____________
>
The expected result is the string "asdfasdfasdfasdfasdf(CR)\r" will be followed by a CR but not LF, causing the terminal cursor to move to the left
This appears to be an issue with the terminal emulation in LuaLoader.
When I connect to the NodeMCU with Putty, I get this output:
> dofile("uwtest.lua")
______first test____________
asdfasdfasdfasdfasdf______should be at end of same line as asdf...______
asdfasdfasdfasdfasdf(newline)
______should be on line following asdf...____________
OVERWRITEsdfasdfasdf(CR)
______should be on newline below OVERWRITE line ____________
>
The Putty output is as expected.

Related

Split a string on new lines, but include empty lines

Let's say I have a string with the contents
local my_str = [[
line1
line2
line4
]]
I'd like to get the following table:
{"line1","line2","","line4"}
In other words, I'd like the blank line 3 to be included in my result. I've tried the following:
local result = {};
for line in string.gmatch(my_str, "[^\n]+") do
table.insert(result, line);
end
However, this produces a result which will not include the blank line 3.
How can I make sure the blank line is included? Am I just using the wrong regex?
Try this instead:
local result = {};
for line in string.gmatch(my_str .. "\n", "(.-)\n") do
table.insert(result, line);
end
If you don't want the empty fifth element that gives you, then get rid of the blank line at the end of my_str, like this:
local my_str = [[
line1
line2
line4]]
(Note that a newline at the beginning of a long literal is ignored, but a newline at the end is not.)
You can replace the + with *, but that won't work in all Lua versions; LuaJIT will add random empty strings to your result (which isn't even technically wrong).
If your string always includes a newline character at the end of the last line like in your example, you can just do something like "([^\n]*)\n" to prevent random empty strings and the last empty string.
In Lua 5.2+ you can also just use a frontier pattern to check for either a newline or the end of the string: [^\n]*%f[\n\0], but that won't work in LuaJIT either.
If you need to support LuaJIT and don't have the trailing newline in your actual string, then you could just add it manually:
string.gmatch(my_str .. "\n", "([^\n]*)\n")

0x85 windows 1252 breaks line if file opened with utf-8 encoding

I have a file with an old format from the 70s used in Companies House (UK company registry).
I inherited a parser written 6 years ago which goes line by line and according to a set of conditions extracts the information from the line and inserts them into a dictionary.
There is a weird character that is breaking a line.
I copied this line to a new file awk '{if(NR==33411) print $0}' PROD216_1950_ew_1.dat > broken and opend broken in vim.
Turns out that weird character is read by vim a <85>.
The result is that everything after MAYFIELD is read as a new line.
Below the line in question:
000376702103032986930001 1993010119941024 193709 0105<BARRY ALEXANDER<GROSVENOR<<<<MAYFIELD 3<41 PLANTATION ROAD<THE PEAK<<HONG KONG<BANK EXECUTIVE<BRITISH<<
in vim becomes
000376702103032986930001 1993010119941024 193709 0105<BARRY ALEXANDER<GROSVENOR<<<<MAYFIELD <85>3<41 PLANTATION ROAD<THE PEAK<<HONG KONG<BANK EXECUTIVE<BRITISH<<
I am using codecs to read this file with a context manager, which I thought was the way of going about it -
Is there anything I am missing? What is that <85>?
with codecs.open(filepath, 'r', 'utf-8') as fh:
for line in fh:
linetype = determine_line_type(line)
if linetype == 'header':
continue
elif linetype == 'company':
do stuff...
elif linetype == 'officer':
do stuff...
vim shows <85> to indicate a hex 85 byte that is invalid in the current encoding (i.e., the encoding it's using to decode the file).
My guess is that the file's encoding is Windows-1252, in which hex 85 denotes the ellipsis character.
So the solution for your parser might be as simple as changing 'utf-8' to 'cp1252' in the codecs.open call.
After going around for some time here and here I came up with this solution, which works.
with open(filepath, encoding='utf-8') as fh:
for line in fh:
byteline = bytearray(line, encoding='utf-8').replace(b'\xc2\x85', b'')
line_clean = byteline.decode(encoding='utf-8')
# do stuff with clean line.
Knowing that the byte sequence that breaks the string is b'\xc2\x85' (it is interpreted as an ... ellipsis character.
First encode the string to an array of bytes with bytearray, then use replace method of the bytearray class, finally, decode the clean line using the decode method, which will return the string without the weird character from before the transformation.

End of line lex

I am writing an interpreter for assembly using lex and yacc. The problem is that I need to parse a word that will strictly be at the end of the file. I've read that there is an anchor $, which can help. However it doesn't work as I expected. I've wrote this in my lex file:
ABC$ {printf("QWERTY\n");}
The input file is:
ABC
without spaces or any other invisible symbols. So I expect the outputput to be QWERTY, however what I get is:
ABC
which I guess means that the program couldn't parse it. Then I thought, that $ might be a regular symbol in lex, so I changed the input file into this:
ABC$
So, if $ isn't a special symbol, then it will be parsed as a normal symbol, and the output will be QWERTY. This doesn't happen, the output is:
ABC$
The question is whether $ in lex is a normal symbol or special one.
In (f)lex, $ matches zero characters followed by a newline character.
That's different from many regex libraries where $ will match at the end of input. So if your file does not have a newline at the end, as your question indicates (assuming you consider newline to be an invisible character), it won't be matched.
As #sepp2k suggests in a comment, the pattern also won't be matched if the input file happens to use Windows line endings (which consist of the sequence \r\n), unless the generated flex file was compiled for Windows. So if you created the file on Windows and run the flex-generated scanner in a Unix environment, the \r will also cause the pattern to fail to match. In that case, you can use (f)lex's trailing context operator:
ABC/\r?\n { puts("Matched ABC at the end of a line"); }
See the flex documentation for patterns for a full description of the trailing context operator. (Search for "trailing context" on that page; it's roughly halfway down.) $ is exactly equivalent to /\n.
That still won't match ABC at the very end of the file. Matching strings at the very end of the file is a bit tricky, but it can be done with two patterns if it's ok to recognise the string other than at the end of the file, triggering a different action:
ABC/. { /* Do nothing. This ABC is not at the end of a line or the file */ }
ABC { puts("ABC recognised at the end of a line"); }
That works because the first pattern will match as long as there is some non-newline character following ABC. (. matches any character other than a newline. See the above link for details.) If you also need to work with Windows line endings, you'll need to modify the trailing context in the first pattern.

GSub with a plus/minus character

I am trying to convert a text source into an HTML readable page.
The code I have have tried:
local newstr=string.gsub(str,"±", "±")
local newstr=string.gsub(str,"%±", "±")
However, the character shows up as  in the output.
I can't seem to find any other documentation on how to handle this specific special character. How do I handle this character when reading in so that it will output properly?
Edit: After trying suggestions I'm able to determine this:
local function sanitizeheader(str)
if not(str)then return "" end
str2 = "Depth ±"
local newstr=string.gsub(str2, string.char(177), "±")
return newstr
end
In the testing, if I use str2 ± does show up in the output. However, when I try to use str as it is passed in from reading the excel file, it doesn't pick up the character and still returns the  character.
Lua string assume strings as sequence of bytes. You are trying utf8 multi byte character. The code you are trying should work as it just replacing a sequence of bytes. However, Lua 5.3 has utf8 library to handle unicode character
local str="±®ª"
for code in str:gmatch(utf8.charpattern) do
print("&#" .. utf8.codepoint(code) .. ";")
end
Output:
±
®
ª
Check Lua Reference Manual for more info.

Print without newline

In BASIC I know of two instructions to print to the screen, PRINT, and WRITE, both of which automatically print strings with a newline at the end. I want to print a string without a newline. How can I do this? I'm using GW-BASIC.
Using PRINT with a semicolon will not print a new line:
10 REM The trailing semicolon prevents a newline
20 PRINT "Goodbye, World!";
Source: Rosettacode

Resources