Lua Line Wrapping excluding certain characters - lua

I've located a code that I want to use when I'm writing notes on a MUD I play. Lines can only be 79 characters long for each note, so it's a hassle sometimes to write a note unless you're counting characters. The code is below:
function wrap(str, limit, indent, indent1)
indent = indent or ""
indent1 = indent1 or indent
limit = limit or 79
local here = 1-#indent1
return indent1..str:gsub("(%s+)()(%S+)()",
function(sp, st, word, fi)
if fi-here > limit then
here = st - #indent
return "\n"..indent..word
end
end)
end
This would work great; I can type a 300 character line and it will format it to 79 characters, respecting full words.
The problem I'm having, and I cannot seem to figure out how to solve, is that sometimes, I want to add colour codes to the line, and colour codes are not counted against word count. For example:
#GThis is a colour-coded #Yline that should #Bbreak off at 79 #Mcharacters, but ignore #Rthe colour codes (#G, #Y, #B, #M, #R, etc) when doing so.
Essentially, it would strip the colour codes away and break the line appropriately, but without losing the colour codes.
Edited to include what it should check, and what the final output should be.
The function would only check the string below for line breaks:
This is a colour-coded line that should break off at 79 characters, but ignore the colour codes (, , , , , etc) when doing so.
but would actually return:
#GThis is a colour-coded #Yline that should #Bbreak off at 79 #Ncharacters, but ignore
the colour codes (#G, #Y, #B, #M, #R, etc) when doing so.
To complicate things, we also have xterm colour codes, which are similar, but look like this:
#x123
It is always #x followed by a 3-digit number. And lastly, to further complicate things, I don't want it to strip out purpose colour codes (which would be ##R, ##x123, etc.).
Is there a clean way of doing this that I'm missing?

function(sp, st, word, fi)
local delta = 0
word:gsub('#([#%a])',
function(c)
if c == '#' then delta = delta + 1
elseif c == 'x' then delta = delta + 5
else delta = delta + 2
end
end)
here = here + delta
if fi-here > limit then
here = st - #indent + delta
return "\n"..indent..word
end
end

Related

Get the range of string in lua in row and column

I'm trying to calculate the range of a given text in terms of row and column.
For following string,
'hello\nworld
The range should be
{
row_start = 0,
col_start = 0,
row_end = 1,
col_end = 4
}
Here, row_start and col_start are NOT important for the question. world will be in the second line hens the row_end is 1. world has 5 characters hens the col_end is 4.
So, I need a function to calculate the number of line breaks and length of the string at the last line to calculate the range.
I couldn't find any other way than calculating the number of line breaks to get row_end. Then reverse the text and find the index of the first newline character to get the col_end. Any other efficient way to do this in Lua?
Given str = "hello\nworld":
I couldn't find any other way than calculating the number of line breaks to get row_end
There is no more efficient way: You have to count the line breaks. Assuming UNIX LFs as in your example, you can simply use gmatch for this (which is presumably more efficient than abusing gsub to do the counting for you):
local row_end = 0
for _ in str:gmatch"\n" do row_end = row_end + 1 end
Then reverse the text and find the index of the first newline character to get the col_end. Any other efficient way to do this in Lua?
Yes, this is indeed needlessly inefficient. The shortest way to do this Lua would be using pattern matching:
local col_end = #str - str:find"[^\n]*$"
Explanation: Find the starting index of the longest "run" of non-newline characters. For str, this would be the index of w. Then subtract this index from the length of the string to find the 0-based index of the last character.
A probably more efficient solution would just remember the index after the last newline (and thus have no issue with possibly poor pattern matching performance):
local after_last_newline_idx = 1
for idx in str:gmatch"\n()" do -- () captures the position after the newline
after_last_newline_idx = idx
end
local col_end = #str - after_last_newline_idx
This could be merged with the first loop to only loop once:
local row_end = 0
local after_last_newline_idx = 1
for idx in str:gmatch"\n()" do -- () captures the position after the newline
row_end = row_end + 1
after_last_newline_idx = idx
end
local col_end = #str - after_last_newline_idx
... taking linear time, which is required. However this avoids creating a garbage string by reversing str. It only loops over the string once to find newlines. If gmatch is too slow for your purposes, you can easily use string:byte or string:sub and a numeric for loop to do the looping over newlines yourself.

Lua length of Frame for Parsing

I have an binary file with shows glibberish infos if i open it in Notepad.
I am working on an plugin to use with wireshark.
So my problem is that I need help. I am reading in an File and need to find 'V' '0' '0' '1' (0x56 0x30 0x30 0x31) in the File, because its the start of an Header, with means there is an packet inside. And I need to do this for the whole file, like parsing. Also should start the Frame with V 0 0 1 and not end with it.
I currently have an Code where I am searching for 0x7E and parse it. What I need is the length of the frame. For example V 0 0 1 is found, so the Length from V to the Position before the next V 0 0 1 in the File. So that I can work with the length and add it to an captured length to get the positions, that wireshark can work with.
For example my unperfect Code for working with 0x7E:
local line = file:read()
local len = 0
for c in (line or ''):gmatch ('.') do
len = len + 1
if c:byte() == 0x7E then
break
end
end
if not line then
return false
end
frame.captured_length = len
Here is also the Problem that the Frame ends with 7E which is wrong. I need something that works perfectly for 'V' '0' '0' '1'. Maybe I need to use string.find?
Please help me!
Thats an example how my file looks like if i use the HEX-Editor in Visual Studio Code.
Lua has some neat pattern tools. Here's a summary:
(...) Exports all captured text within () and gives it to us.
-, +, *, ?, "Optional match as little as possible", "Mandatory match as much as possible", "optional match as much as possible", "Optional match only once", respectively.
^ and $: Root to start or end of file, respectively.
We'll be using this universal input and output to test with:
local output = {}
local input = "V001Packet1V001Packet2oooV001aaandweredonehere"
The easiest way to do this is probably to recursively split the string, with one ending at the character before "V", and the other starting at the character after "1". We'll use a pattern which exports the part before and after V001:
local this, next = string.match(input, "(.-)V001(.*)")
print(this,next) --> "", "Packet1V001Packet2..."
Simple enough. Now we need to do it again, and we also need to eliminate the first empty packet, because it's a quirk of the pattern. We can probably just say that any empty this string should not be added:
if this ~= "" then
table.insert(output, this)
end
Now, the last packet will return nil for both this and next, because there will not be another V001 at the end. We can prepare for that by simply adding the last part of the string when the pattern does not match.
All put together:
local function doStep(str)
local this, next = string.match(str, "(.-)V001(.*)")
print(this,next)
if this then
-- There is still more packets left
if this ~= "" then
-- This is an empty packet
table.insert(output, this)
end
if next ~= "" then
-- There is more out there!
doStep(next)
end
else
-- We are the last survivor.
table.insert(output, str)
end
end
Of course, this can be improved, but it should be a good starting point. To prove it works, this script:
doStep(input)
print(table.concat(output, "; "))
prints this:
Packet1; Packet2ooo; aaandweredonehere

Calculating ISIN checksum

HI I know there have been may question about this here but I wasn't able to find a detailed enough answer, Wikipedia has two examples of ISIN and how is their checksum calculated.
The part of calculation that I'm struggling with is
Multiply the group containing the rightmost character
The way I understand this statement is:
Iterate through each character from right to left
once you stumble upon a character rather than digit record its position
if the position is an even number double all numeric values in even position
if the position is an odd number double all numeric values in odd position
My understanding has to be wrong because there are at least two problems:
Every ISIN starts with two character country code so position of rightmost character is always the first character
If you omit the first two characters then there is no explanation as to what to do with ISINs that are made up of all numbers (except for first two characters)
Note
isin.org contains even less information on verifying ISINs, they even use the same example as Wikipedia.
I agree with you; the definition on Wikipedia is not the clearest I have seen.
There's a piece of text just before the two examples that explains when one or the other algorithm should be used:
Since the NSIN element can be any alpha numeric sequence (9 characters), an odd number of letters will result in an even number of digits and an even number of letters will result in an odd number of digits. For an odd number of digits, the approach in the first example is used. For an even number of digits, the approach in the second example is used
The NSIN is identical to the ISIN, excluding the first two letters and the last digit; so if the ISIN is US0378331005 the NSIN is 037833100.
So, if you want to verify the checksum digit of US0378331005, you'll have to use the "first algorithm" because there are 9 digits in the NSIN. Conversely, if you want to check AU0000XVGZA3 you're going to use the "second algorithm" because the NSIN contains 4 digits.
As to the "first" and "second" algorithms, they're identical, with the only exception that in the former you'll multiply by 2 the group of odd digits, whereas in the latter you'll multiply by 2 the group of even digits.
Now, the good news is, you can get away without this overcomplicated algorithm.
You can, instead:
Take the ISIN except the last digit (which you'll want to verify)
Convert all letters to numbers, so to obtain a list of digits
Reverse the list of digits
All the digits in an odd position are doubled and their digits summed again if the result is >= 10
All the digits in an even position are taken as they are
Sum all the digits, take the modulo, subtract the result from 0 and take the absolute value
The only tricky step is #4. Let's clarify it with a mini-example.
Suppose the digits in an odd position are 4, 0, 7.
You'll double them and get: 8, 0, 14.
8 is not >= 10, so we take it as it is. Ditto for 0. 14 is >= 10, so we sum its digits again: 1+4=5.
The result of step #4 in this mini-example is, therefore: 8, 0, 5.
A minimal, working implementation in Python could look like this:
import string
isin = 'US4581401001'
def digit_sum(n):
return (n // 10) + (n % 10)
alphabet = {letter: value for (value, letter) in
enumerate(''.join(str(n) for n in range(10)) + string.ascii_uppercase)}
isin_to_digits = ''.join(str(d) for d in (alphabet[v] for v in isin[:-1]))
isin_sum = 0
for (i, c) in enumerate(reversed(isin_to_digits), 1):
if i % 2 == 1:
isin_sum += digit_sum(2*int(c))
else:
isin_sum += int(c)
checksum_digit = abs(- isin_sum % 10)
assert int(isin[-1]) == checksum_digit
Or, more crammed, just for functional fun:
checksum_digit = abs( - sum(digit_sum(2*int(c)) if i % 2 == 1 else int(c)
for (i, c) in enumerate(
reversed(''.join(str(d) for d in (alphabet[v] for v in isin[:-1]))), 1)) % 10)

How to refactor string containing variable names into booleans?

I have an SPSS variable containing lines like:
|2|3|4|5|6|7|8|10|11|12|13|14|15|16|18|20|21|22|23|24|25|26|27|28|29|
Every line starts with pipe, and ends with one. I need to refactor it into boolean variables as the following:
var var1 var2 var3 var4 var5
|2|4|5| 0 1 0 1 1
I have tried to do it with a loop like:
loop # = 1 to 72.
compute var# = SUBSTR(var,2#,1).
end loop.
exe.
My code won't work with 2 or more digits long numbers and also it won't place the values into their respective variables, so I've tried nest the char.substr(var,char.rindex(var,'|') + 1) into another loop with no luck because it still won't allow me to recognize the variable number.
How can I do it?
This looks like a nice job for the DO REPEAT command. However the type conversion is somewhat tricky:
DO REPEAT var#i=var1 TO var72
/i=1 TO 72.
COMPUTE var#i = CHAR.INDEX(var,CONCAT("|",LTRIM(STRING(i,F2.0)),"|"))>0).
END REPEAT.
Explanation: Let's go from the inside to the outside:
STRING(value,F2.0) converts the numeric values into a string of two digits (with a leading white space where the number consist of just one digit), e.g. 2 -> " 2".
LTRIM() removes the leading whitespaces, e.g. " 2" -> "2".
CONCAT() concatenates strings. In the above code it adds the "|" before and after the number, e.g. "2" -> "|2|"
CHAR.INDEX(stringvar,searchstring) returns the position at which the searchstring was found. It returns 0 if the searchstring wasn't found.
CHAR.INDEX(stringvar,searchstring)>0 returns a boolean value indicating if the searchstring was found or not.
It's easier to do the manipulations in Python than native SPSS syntax.
You can use SPSSINC TRANS extension for this purpose.
/* Example data*/.
data list free / TextStr (a99).
begin data.
"|2|3|4|5|6|7|8|10|11|12|13|14|15|16|18|20|21|22|23|24|25|26|27|28|29|"
end data.
/* defining function to achieve task */.
begin program.
def runTask(x):
numbers=map(int,filter(None,[i.strip() for i in x.lstrip('|').split("|")]))
answer=[1 if i in numbers else 0 for i in xrange(1,max(numbers)+1)]
return answer
end program.
/* Run job*/.
spssinc trans result = V1 to V30 type=0 /formula "runTask(TextStr)".
exe.

Lua base converter

I need a base converter function for Lua. I need to convert from base 10 to base 2,3,4,5,6,7,8,9,10,11...36 how can i to this?
In the string to number direction, the function tonumber() takes an optional second argument that specifies the base to use, which may range from 2 to 36 with the obvious meaning for digits in bases greater than 10.
In the number to string direction, this can be done slightly more efficiently than Nikolaus's answer by something like this:
local floor,insert = math.floor, table.insert
function basen(n,b)
n = floor(n)
if not b or b == 10 then return tostring(n) end
local digits = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
local t = {}
local sign = ""
if n < 0 then
sign = "-"
n = -n
end
repeat
local d = (n % b) + 1
n = floor(n / b)
insert(t, 1, digits:sub(d,d))
until n == 0
return sign .. table.concat(t,"")
end
This creates fewer garbage strings to collect by using table.concat() instead of repeated calls to the string concatenation operator ... Although it makes little practical difference for strings this small, this idiom should be learned because otherwise building a buffer in a loop with the concatenation operator will actually tend to O(n2) performance while table.concat() has been designed to do substantially better.
There is an unanswered question as to whether it is more efficient to push the digits on a stack in the table t with calls to table.insert(t,1,digit), or to append them to the end with t[#t+1]=digit, followed by a call to string.reverse() to put the digits in the right order. I'll leave the benchmarking to the student. Note that although the code I pasted here does run and appears to get correct answers, there may other opportunities to tune it further.
For example, the common case of base 10 is culled off and handled with the built in tostring() function. But similar culls can be done for bases 8 and 16 which have conversion specifiers for string.format() ("%o" and "%x", respectively).
Also, neither Nikolaus's solution nor mine handle non-integers particularly well. I emphasize that here by forcing the value n to an integer with math.floor() at the beginning.
Correctly converting a general floating point value to any base (even base 10) is fraught with subtleties, which I leave as an exercise to the reader.
you can use a loop to convert an integer into a string containting the required base. for bases below 10 use the following code, if you need a base larger than that you need to add a line that mapps the result of x % base to a character (usign an array for example)
x = 1234
r = ""
base = 8
while x > 0 do
r = "" .. (x % base ) .. r
x = math.floor(x / base)
end
print( r );

Resources