Iterate Chinese string in Lua / Torch - lua

I have a lua string in Chinese, such as
str = '这是一个中文字符串' -- in English: 'this is a Chinese string'
Now I would like to iterate the string above, to get the following result:
str[1] = '这'
str[2] = '是'
str[3] = '一'
str[4] = '个'
str[5] = '中'
str[6] = '文'
str[7] = '字'
str[8] = '符'
str[9] = '串'
and also output 9 for the length of the string.
Any ideas?

Something like this should work if you are using utf8 module from Lua 5.3 or luautf8, which works with LuaJIT:
local str = '这是一个中文字符串'
local tbl = {}
for p, c in utf8.codes(str) do
table.insert(tbl, utf8.char(c))
end
print(#tbl) -- prints 9

I haven't used non-english characters in lua before and my emulator just puts them in as '?' but something along the lines of this might work:
convert = function ( str )
local temp = {}
for c in str:gmatch('.') do
table.insert(temp, c)
end
return temp
end
This is a simple function that utilizes string.gmatch() to separate the string into individual characters and save them into a table. It would be used like this:
t = convert('abcd')
Which would make 't' a table containing a, b, c and d.
t[1] = a
t[2] = b
...
I am not sure if this will work for the Chinese characters but it is worth a shot.

Related

How do I use more then one pattern for gmatch

Hello I am trying to get some data from a text file and put it into a table.
Im not sure how to add more then one pattern while also doing what I want, I know this pattern by its self %a+ finds letters and %b{} finds brackets, but I am not sure how to combine them together so that I find the letters as a key and the brackets as a value and have them be put into a table that I could use.
text file :
left = {{0,63},{16,63},{32,63},{48,63}}
right = {{0,21},{16,21},{32,21},{48,21}}
up = {{0,42},{16,42},{32,42},{48,42}}
down = {{0,0},{16,0},{32,0},{48,0}}
code:
local function get_animations(file_path)
local animation_table = {}
local file = io.open(file_path,"r")
local contents = file:read("*a")
for k, v in string.gmatch(contents, ("(%a+)=(%b{})")) do -- A gets words and %b{} finds brackets
animation_table[k] = v
print("key : " .. k.. " Value : ".. v)
end
file:close()
end
get_animations("Sprites/Player/MainPlayer.txt")
This is valid Lua code, why not simply execute it?
left = {{0,63},{16,63},{32,63},{48,63}}
right = {{0,21},{16,21},{32,21},{48,21}}
up = {{0,42},{16,42},{32,42},{48,42}}
down = {{0,0},{16,0},{32,0},{48,0}}
If you don't want the data in globals, use the string library to turn it into
return {
left = {{0,63},{16,63},{32,63},{48,63}},
right = {{0,21},{16,21},{32,21},{48,21}},
up = {{0,42},{16,42},{32,42},{48,42}},
down = {{0,0},{16,0},{32,0},{48,0}},
}
befor you execute it.
If you insist on parsing that file you can use a something like this for each line:
local line = "left = {{0,63},{16,63},{32,63},{48,63}}"
print(line:match("^%w+"))
for num1, num2 in a:gmatch("(%d+),(%d+)") do
print(num1, num2)
end
This should be enough to get you started. Of course you wouldn't print those values but put them into a table.

Lua - too many captures. How fix it?

Have problems with this one. If try convert cirilic words or wors have to many symbols and have error
function to_string(t)
local o = {};
for _, v in ipairs(t) do
local b = v < 0 and (0xff + v + 1) or v;
table.insert(o, string.char(b));
end
return table.concat(o);
end
function to_bytes(s)
local c = { s:match( (s:gsub(".", "(.)")) ) };
local o = {};
for _, v in pairs(c) do
table.insert(o, v:byte());
end
return o;
end
local t = to_bytes("If this have to many words или русские");
local out = "\\"
local chars = #t;
for i = 1, chars do
out = out..tostring(t[i]);
if i < chars then
out = out.."\\"
end
end
out = out..""
I think the error is self-explanatory: you have too many captures in your pattern (those groups that are wrapped into parentheses). The default value is 32. You have a couple of options: (1) recompile your Lua version to use a large number (you'll have to modify LUA_MAXCAPTURES value), but keep in mind that this limit is there for a reason and (2) change your pattern to avoid this many captures (possibly splitting into smaller fragments/patterns). You may also consider using more powerful parsers, like LPEG.
You don't need regex to convert string to array of bytes
function to_bytes(s)
return {s:byte(1, -1)}
end

Lua: replace a substring

I have something like
str = "What a wonderful string //011// this is"
I have to replace the //011// with something like convertToRoman(011) and then get
str = "What a wonderful string XI this is"
However, the conversion into roman numbers is no problem here.
It is also possible that the string str didn't has a //...//. In this case it should simply return the same string.
function convertSTR(str)
if not string.find(str,"//") then
return str
else
replace //...// with convertToRoman(...)
end
return str
end
I know that I can use string.find to get the complete \\...\\ sequence. Is there an easier solution with pattern matching or something similiar?
string.gsub accepts a function as a replacement. So, this should work
new = str:gsub("//(.-)//", convertToRoman)
I like LPEG, therefore here is a solution with LPEG:
local lpeg = require"lpeg"
local C, Ct, P, R = lpeg.C, lpeg.Ct, lpeg.P, lpeg.R
local convert = function(x)
return "ROMAN"
end
local slashed = P"//" * (R("09")^1 / convert) * P"//"
local other = C((1 - slashed)^0)
local grammar = Ct(other * (slashed * other)^0)
print(table.concat(grammar:match("What a wonderful string //011// this is"),""))

Lua need to split at comma

I've googled and I'm just not getting it. Seems like such a simple function, but of course Lua doesn't have it.
In Python I would do
string = "cat,dog"
one, two = string.split(",")
and then I would have two variables, one = cat. two = dog
How do I do this in Lua!?
Try this
str = 'cat,dog'
for word in string.gmatch(str, '([^,]+)') do
print(word)
end
'[^,]' means "everything but the comma, the + sign means "one or more characters". The parenthesis create a capture (not really needed in this case).
If you can use libraries, the answer is (as often in Lua) to use Penlight.
If Penlight is too heavy for you and you just want to split a string with a single comma like in your example, you can do something like this:
string = "cat,dog"
one, two = string:match("([^,]+),([^,]+)")
Add this split function on the top of your page:
function string:split( inSplitPattern, outResults )
if not outResults then
outResults = { }
end
local theStart = 1
local theSplitStart, theSplitEnd = string.find( self, inSplitPattern, theStart )
while theSplitStart do
table.insert( outResults, string.sub( self, theStart, theSplitStart-1 ) )
theStart = theSplitEnd + 1
theSplitStart, theSplitEnd = string.find( self, inSplitPattern, theStart )
end
table.insert( outResults, string.sub( self, theStart ) )
return outResults
end
Then do as follows:
local myString = "Flintstone, Fred, 101 Rockledge, Bedrock, 98775, 555-555-1212"
local myTable = myString:split(", ")
for i = 1, #myTable do
print( myTable[i] ) -- This will give your needed output
end
For more information, visit : Tutorial: Lua String Magic
Keep Coding...............:)
-- like C strtok, splits on one more delimiter characters (finds every string not containing any of the delimiters)
function split(source, delimiters)
local elements = {}
local pattern = '([^'..delimiters..']+)'
string.gsub(source, pattern, function(value) elements[#elements + 1] = value; end);
return elements
end
-- example: var elements = split("bye# bye, miss$ american# pie", ",#$# ")
-- returns "bye" "bye" "miss" "american" "pie"
To also handle optional white space you can do:
str = "cat,dog,mouse, horse"
for word in str:gmatch('[^,%s]+') do
print(word)
end
Output will be:
cat
dog
mouse
horse
This is how I do that on mediawiki:
str = "cat,dog"
local result = mw.text.split(str,"%s*,%s*")
-- result[0] will give "cat", result[1] will give "dog"
actually, if you don't care spaces, you can use:
str = "cat,dog"
local result = mw.text.split(str,",")
-- result[0] will give "cat", result[1] will give "dog"
The API used here is implemented in Scribunto MediaWiki extension. Here is the split() method reference documentation and here is the source code for that. It relies on a lot of other capabilities in Scribunto's Lua common libraries, so it will only work for you if you are actually using MediaWiki or plan to import most of the Scribunto common library.
Functions like string.split() are largely unnecessary in Lua since you can
express string operations in LPEG.
If you still need a dedicated function a convenient approach is
to define a splitter factory (mk_splitter() in below snippet)
from which you can then derive custom splitters.
local lpeg = require "lpeg"
local lpegmatch = lpeg.match
local P, C = lpeg.P, lpeg.C
local mk_splitter = function (pat)
if not pat then
return
end
pat = P (pat)
local nopat = 1 - pat
local splitter = (pat + C (nopat^1))^0
return function (str)
return lpegmatch (splitter, str)
end
end
The advantage of using LPEG is that the function accepts
both valid Lua strings and patterns as argument.
Here is how you would use it to create a function that
splits strings at the , character:
commasplitter = mk_splitter ","
print (commasplitter [[foo, bar, baz, xyzzy,]])
print (commasplitter [[a,b,c,d,e,f,g,h]])

Decompressing LZW in Lua [duplicate]

Here is the Pseudocode for Lempel-Ziv-Welch Compression.
pattern = get input character
while ( not end-of-file ) {
K = get input character
if ( <<pattern, K>> is NOT in
the string table ){
output the code for pattern
add <<pattern, K>> to the string table
pattern = K
}
else { pattern = <<pattern, K>> }
}
output the code for pattern
output EOF_CODE
I am trying to code this in Lua, but it is not really working. Here is the code I modeled after an LZW function in Python, but I am getting an "attempt to call a string value" error on line 8.
function compress(uncompressed)
local dict_size = 256
local dictionary = {}
w = ""
result = {}
for c in uncompressed do
-- while c is in the function compress
local wc = w + c
if dictionary[wc] == true then
w = wc
else
dictionary[w] = ""
-- Add wc to the dictionary.
dictionary[wc] = dict_size
dict_size = dict_size + 1
w = c
end
-- Output the code for w.
if w then
dictionary[w] = ""
end
end
return dictionary
end
compressed = compress('TOBEORNOTTOBEORTOBEORNOT')
print (compressed)
I would really like some help either getting my code to run, or helping me code the LZW compression in Lua. Thank you so much!
Assuming uncompressed is a string, you'll need to use something like this to iterate over it:
for i = 1, #uncompressed do
local c = string.sub(uncompressed, i, i)
-- etc
end
There's another issue on line 10; .. is used for string concatenation in Lua, so this line should be local wc = w .. c.
You may also want to read this with regard to the performance of string concatenation. Long story short, it's often more efficient to keep each element in a table and return it with table.concat().
You should also take a look here to download the source for a high-performance LZW compression algorithm in Lua...

Resources