String formatting with unicode characters using Lua - lua

I trying to align string with unicode characters.
But it doesn't works.
Spaces is not correct. :(
Lua's version is 5.1.
What is the problem?
local t =
{
"character",
"루아", -- korean
"abc감사합니다123", -- korean
"ab23",
"lua is funny",
"ㅇㅅㅇ",
"美國大將", --chinese
"qwert-54321",
};
for k, v in pairs(t) do
print(string.format("%30s", v));
end
result:----------------------------------------------
character
루아
abc감사합니다123
ab23
lua is funny
ㅇㅅㅇ
美國大將
qwert-54321

function utf8format(fmt, ...)
local args, strings, pos = {...}, {}, 0
for spec in fmt:gmatch'%%.-([%a%%])' do
pos = pos + 1
local s = args[pos]
if spec == 's' and type(s) == 'string' and s ~= '' then
table.insert(strings, s)
args[pos] = '\1'..('\2'):rep(#s:gsub("[\128-\191]", "")-1)
end
end
return (fmt:format((table.unpack or unpack)(args))
:gsub('\1\2*', function() return table.remove(strings, 1) end)
)
end
local t =
{
"character",
"루아", -- korean
"abc감사합니다123", -- korean
"ab23",
"lua is funny",
"ㅇㅅㅇ",
"美國大將", --chinese
"qwert-54321",
"∞"
};
for k, v in pairs(t) do
print(utf8format("%30s", v));
end
But there is another problem: on most fonts korean and chinese symbols are wider than latin letters.

The ASCII strings are all formatted correctly, while the non-ASCII strings are not.
The reason is because, the length of the strings are counted with their number of bytes. For instance, with UTF-8 encodings,
print(string.len("美國大將")) -- 12
print(string.len("루아")) -- 6
So %s in string.format treat these two strings as if their width is 12 / 6.

Related

Generating star pattern in LUA

I am new to programming in LUA. And I am not able to solve this question below.
Given a number N, generate a star pattern such that on the first line there are N stars and on the subsequent lines the number of stars decreases by 1.
The pattern generated should have N rows. In every row, every fifth star (*) is replaced with a hash (#). Every row should have the required number of stars (*) and hash (#) symbols.
Sample input and output, where the first line is the number of test cases
This is what I tried.. And I am not able to move further
function generatePattern()
n = tonumber(io.read())
i = n
while(i >= 1)
do
j = 1
while(j<=i)
do
if(j<=i)
then
if(j%5 == 0)
then
print("#");
else
print("*");
end
print(" ");
end
j = j+1;
end
print("\n");
i = i-1;
end
end
tc = tonumber(io.read())
for i=1,tc
do
generatePattern()
end
First, just the stars without hashes. This part is easy:
local function pattern(n)
for i=n,1,-1 do
print(string.rep("*", i))
end
end
To replace each 5th asterisk with a hash, you can extend the expression with the following substitution:
local function pattern(n)
for i=n,1,-1 do
print((string.rep("*", i):gsub("(%*%*%*%*)%*", "%1#")))
end
end
The asterisks in the pattern need to be escaped with a %, since * holds special meaning within Lua patterns.
Note that string.gsub returns 2 values, but they can be truncated to one value by adding an extra set of parentheses, leading to the somewhat awkward-looking form print((..)).
Depending on Lua version the metamethod __index holding rep for repeats...
--- Lua 5.3
n=10
asterisk='*'
print(asterisk:rep(n))
-- puts out: **********
#! /usr/bin/env lua
for n = arg[1], 1, -1 do
local char = ''
while #char < n do
if #char %5 == 4 then char = char ..'#'
else char = char ..'*'
end -- mod 5
end -- #char
print( char )
end -- arg[1]
chmod +x asterisk.lua
./asterisk.lua 15
Please do not follow this answer since it is bad coding style! I would delete it but SO won't let me. See comment and other answers for better solutions.
My Lua print adds newlines to each printout, therefore I concatenate each character in a string and print the concatenated string out afterwards.
function generatePattern()
n = tonumber(io.read())
i = n
while(i >= 1)
do
ouput = ""
j = 1
while(j<=i)
do
if(j%5 == 0)
then
ouput=ouput .. "#";
else
ouput=ouput .. "*";
end
j = j+1;
end
print(ouput);
i = i-1;
end
end
Also this code is just yours minimal transformed to give the correct output. There are plenty of different ways to solve the task, some are faster or more intuitive than others.

regex for matching a string into words but leaving multiple spaces

Here's what I expect. I have a string with numbers that need to be changed into letters (a kind of cipher) and spaces to move into different letter, and there is a tripple spaces that represent a space in output. For example, a string "394 29 44 44 141 6" will be decrypted into "Hell No".
function string.decrypt(self)
local output = ""
for i in self:gmatch("%S+") do
for j, k in pairs(CODE) do
output = output .. (i == j and k or "")
end
end
return output
end
Even though it decrypts the numbers correctly I doesn't work with spacebars. So the string I used above decrypts into "HellNo", instead of expected "Hell No". How can I fix this?
You can use
CODE = {["394"] = "H", ["29"] = "e", ["44"] = "l", ["141"] = "N", ["6"] = "o"}
function replace(match)
local ret = nil
for i, v in pairs(CODE) do
if i == match then
ret = v
end
end
return ret
end
function decrypt(s)
return s:gsub("(%d+)%s?", replace):gsub(" ", " ")
end
print (decrypt("394 29 44 44 141 6"))
Output will contain Hell No. See the Lua demo online.
Here, (%d+)%s? in s:gsub("(%d+)%s?", replace) matches and captures one or more digits and just matches an optional whitespace (with %s?) and the captured value is passed to the replace function, where it is mapped to the char value in CODE. Then, all double spaces are replaced with a single space with gsub(" ", " ").

Lua: Type of a character

I need a function
function getCharType(c)
local i = string.byte(c) -- works only for 1 byte chars
if (i > 48) and (i < 57) then return 1 end
if (i > 97) and (i < 122) then return 2 end
return 0
end
which should return
2 - if c is a letter
1 - if c is a digit
0 - if c is a symbol (anything else)
c itself will already be a lower case character: charType = getCharType(string.lower(Character)). If Unicode characters are possible, that would be fine.
With the above getCharType("ö") is 0.
To find out whether a non-ASCII character is an uppercase or lowercase letter or a number, you need Unicode data. Module:Unicode data on Wikipedia has a function like this that uses Module:Unicode data/category (data for the General Category of Unicode characters).
Here's an adaptation of the lookup_category function from Module:Unicode data. I haven't included the Unicode data (Module:Unicode data/category); you will have to copy it from the link above.
local category_data -- set this variable to the table in Module:Unicode data/category above
local floor = math.floor
local function binary_range_search(code_point, ranges)
local low, mid, high
low, high = 1, #ranges
while low <= high do
mid = floor((low + high) / 2)
local range = ranges[mid]
if code_point < range[1] then
high = mid - 1
elseif code_point <= range[2] then
return range
else
low = mid + 1
end
end
return nil
end
function get_category(code_point)
if category_data.singles[code_point] then
return category_data.singles[code_point]
else
local range = binary_range_search(code_point, category_data.ranges)
return range and range[3] or "Cn"
end
end
The function get_category takes a code point (a number) and returns the name of the General Category. I guess the categories you are interested in are Nd (number, decimal digit) and the categories that begin with L (letter).
You will need a function that converts a character to a codepoint. If the file is encoded in UTF-8 and you are using Lua 5.3, you can use the utf8.codepoint function: get_category(utf8.codepoint('ö')) will result in 'Ll'. You can convert category codes to the number value that your function above uses: function category_to_number(category) if category == "Nd" then return 1 elseif category:sub(1, 1) == "L" then return 2 else return 0 end end.
Works only with ASCII characters (not Unicode)
function getCharType(c)
return #c:rep(3):match(".%w?%a?")-1
end

Lua find operand in a string

I have a Lua string like "382+323" or "32x291" or "94-23", how can I check and return the position of the operands?
I found String.find(s, "[+x-]") did not work. Any ideas?
th> str = '5+3'
th> string.find(str, '[+-x]')
1 1
th> string.find(str, '[+x-]')
2 2
[+-x] is a pattern match for 1 character in the range between "+" and "x".
When you want to use dash as character and not as the meta character you should start or end the character group with it.
print("Type an arithmetic expression, such as 382 x 3 / 15")
expr = io.read()
i = -1
while i do
-- Find the next operator, starting from the position of the previous one.
-- The signals + and - are special characters,
-- so you have to use the % char to escape each one.
-- [The find function returns the indices of s where this occurrence starts and ends][1].
-- Here we are obtaining just the start index.
i = expr:find("[%+x%-/]", i+1)
if i then
print("Operator", expr:sub(i, i), "at position", i)
end
end

"translating" One character to another in lua

I want to make a lua script that takes the input of a table, then outputs the strings in that table in their full width counterparts, eg
input = {"Hello", " ", "World"}
print(full(table.concat(input)))
and it will print "Hello World"
I tried it using this:
local encoding = [[ 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!゛#$%&()*+、ー。/:;〈=〉?@[\\]^_‘{|}~]]
function char(i)
return encoding:sub(i:len(),i:len())
end
function decode(t)
for i=1,#t do t[i]=char(t[i]) end
return table.concat(t)
end
function returns(word, word_eol)
print(char(word_eol[2]))
end
but that did not work
note: it is a plugin for hexchat that's why I have it as print(char(word_eol[2])))
Because when you hook a command in hexchat it spits out a table that is the command name, then what was entered after
If (string) = [[ 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!゛#$%&()*+、ー。/:;〈=〉?@[\]^_‘{|}~]], you're finding the n th character of (string), with n being the length of the character, which will always be one. If I understand correctly, this will do the job, by having a separate alphabet and matching the characters.
local encoding = [[ 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!゛#$%&()*+、ー。/:;〈=〉?@[]^_‘{|}~]]
local decoding = [[ 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&()*+,-./:;{=}?#[]^_'{|}~]]
function char(i)
local l = decoding:find(i,1,true)
return encoding:sub(l,l)
end
function decode(t)
for i=1,#t do t[i]=char(t[i]) end
return table.concat(t)
end
function returns(word, word_eol)
print(char(word_eol[2]))
end
function full(s)
return (s:gsub('.', function(c)
c = c:byte()
if c == 0x20 then
return string.char(0xE3, 0x80, 0x80)
elseif c >= 0x21 and c <= 0x5F then
return string.char(0xEF, 0xBC, c+0x60)
elseif c >= 0x60 and c <= 0x7E then
return string.char(0xEF, 0xBD, c+0x20)
end
end))
end

Resources