How to replace many symbols with a single word in Lua? - lua

I must replace all of these characters, "①②③④⑤⑥⑦⑧⑨⑩", with "\item".
I have used this code:
stra = string.gsub(text, "①", "\\item")
strb = string.gsub(stra, "②", "\\item")
strc = string.gsub(strb, "③", "\\item")
strd = string.gsub(strc, "④", "\\item")
stre = string.gsub(strd, "⑤", "\\item")
However, this is very verbose. Is there a simpler way to replace all of those items?

local symbols_trans = {
["\226\145\160"]--[[①]] = "\\item1",
["\226\145\161"]--[[②]] = "\\bananas",
["\226\145\162"]--[[③]] = "\\cactus",
["\226\145\163"]--[[④]] = "\\etc",
["\226\145\164"]--[[⑤]] = "\\item5",
["\226\145\165"]--[[⑥]] = "\\item6",
["\226\145\166"]--[[⑦]] = "\\item7",
["\226\145\167"]--[[⑧]] = "\\item8",
["\226\145\168"]--[[⑨]] = "\\item9",
["\226\145\169"]--[[⑩]] = "\\item10",
}
text = string.gsub(text, "(\266\145.)", symbol_trans)
Or if you want to replace them all with"\\item":
text = string.gsub(text,
"\266\145[\160-\169]",
"\\item"
)
[\160-\169] is equivalent to [\160\161\162\163\164\165\166\167\168\169].
See the Lua manual for information on ranges and, in general, Lua patterns.
You could also be fancy:
text = string.gsub(text,
"\266\145([\160-169])",
function(c)
return "\\item"..(string.byte(c)-160+1)
end
)
This will turn ① into \item1, ② into \item2, and so on.

Use a "set" as described in the tutorial: http://lua-users.org/wiki/PatternsTutorial
string.gsub(text, "[①②③④⑤⑥⑦⑧⑨⑩]", "\\item")

Is there a simpler way to replace all of those items?
Not without a Lua pattern matching library that knows what UTF-8 is. Lua is not Unicode aware; it has no idea how to search for Unicode symbols.
If you're using some non-multibyte encoding, then what John suggested might work. But not if it's UTF-8.
For your specific case, you could always do this:
local symbolsToChange { "①", "②", ...}
for i, sym in ipairs(symbolsToChange) do
string.gsub(text, sym, "\\item")
end

Related

How can I use Chinese letters for locals lua

im trying to make locals with Chinese letters
local 屁 = p
or
屁 = p
none of those work
any ways to do it?
You can't do this as "屁" is not a valid Lua identifier.
Lua identifiers can only have letters, numbers and underscores and must not start with a number.
However, you can create a table with a key 屁:
local chinese_letters = {
["屁"] = p
}
And access it as chinese_letters["屁"], for example
local chinese_letters = {
["屁"] = 10
}
print(chinese_letters["屁"])
By the way, the correct name for these chinese characters is Hanzi

Lua unusual variable name (question mark variable)

I have stumbled upon this line of code and I am not sure what the [ ? ] part represents (my guess is it's a sort of a wildcard but I searched it for a while and couldn't find anything):
['?'] = function() return is_canadian and "eh" or "" end
I understand that RHS is a functional ternary operator. I am curious about the LHS and what it actually is.
Edit: reference (2nd example):
http://lua-users.org/wiki/SwitchStatement
Actually, it is quite simple.
local t = {
a = "aah",
b = "bee",
c = "see",
It maps each letter to a sound pronunciation. Here, a need to be pronounced aah and b need to be pronounced bee and so on. Some letters have a different pronunciation if in american english or canadian english. So not every letter can be mapped to a single sound.
z = function() return is_canadian and "zed" or "zee" end,
['?'] = function() return is_canadian and "eh" or "" end
In the mapping, the letter z and the letter ? have a different prononciation in american english or canadian english. When the program will try to get the prononciation of '?', it will calls a function to check whether the user want to use canadian english or another english and the function will returns either zed or zee.
Finally, the 2 following notations have the same meaning:
local t1 = {
a = "aah",
b = "bee",
["?"] = "bee"
}
local t2 = {
["a"] = "aah",
["b"] = "bee",
["?"] = "bee"
}
If you look closely at the code linked in the question, you'll see that this line is part of a table constructor (the part inside {}). It is not a full statement on its own. As mentioned in the comments, it would be a syntax error outside of a table constructor. ['?'] is simply a string key.
The other posts alreay explained what that code does, so let me explain why it needs to be written that way.
['?'] = function() return is_canadian and "eh" or "" end is embedded in {}
It is part of a table constructor and assigns a function value to the string key '?'
local tbl = {a = 1} is syntactic sugar for local tbl = {['a'] = 1} or
local tbl = {}
tbl['a'] = 1
String keys that allow that convenient syntax must follow Lua's lexical conventions and hence may only contain letters, digits and underscore. They must not start with a digit.
So local a = {? = 1} is not possible. It will cause a syntax error unexpected symbol near '?' Therefor you have to explicitly provide a string value in square brackets as in local a = {['?'] = 1}
they gave each table element its own line
local a = {
1,
2,
3
}
This greatly improves readability for long table elements or very long tables and allows you maintain a maximum line length.
You'll agree that
local tbl = {
z = function() return is_canadian and "zed" or "zee" end,
['?'] = function() return is_canadian and "eh" or "" end
}
looks a lot cleaner than
local tbl = {z = function() return is_canadian and "zed" or "zee" end,['?'] = function() return is_canadian and "eh" or "" end}

cl_http_utility not normalizing my url. Why?

Via an enterpreise service consumer I connect to a webservice, which returns me some data, and also url's.
However, I tried all methods of the mentioned class above and NO METHOD seems to convert the unicode-characters inside my url into the proper readable characters.... ( in this case '=' and ';' ) ...
The only method, which runs properly is "is_valid_url", which returns false, when I pass url's like this:
http://not_publish-workflow-dev.hq.not_publish.com/lc/content/forms/af/not_publish/request-datson-internal/v01/request-datson-internal.html?taskId\u003d105862\u0026wcmmode\u003ddisabled
What am I missing?
It seems that this format is for json values. Usually = and & don't need to be written with the \u prefix. To decode all \u characters, you may use this code:
DATA(json_value) = `http://not_publish-workflow-dev.hq.not_publish.com/lc`
&& `/content/forms/af/not_publish/request-datson-internal/v01`
&& `/request-datson-internal.html?taskId\u003d105862\u0026wcmmode\u003ddisabled`.
FIND ALL OCCURRENCES OF REGEX '\\u....' IN json_value RESULTS DATA(matches).
SORT matches BY offset DESCENDING.
LOOP AT matches ASSIGNING FIELD-SYMBOL(<match>).
DATA hex2 TYPE x LENGTH 2.
hex2 = to_upper( substring( val = json_value+<match>-offset(<match>-length) off = 2 ) ).
DATA(uchar) = cl_abap_conv_in_ce=>uccp( hex2 ).
REPLACE SECTION OFFSET <match>-offset LENGTH <match>-length OF json_value WITH uchar.
ENDLOOP.
ASSERT json_value = `http://not_publish-workflow-dev.hq.not_publish.com/lc`
&& `/content/forms/af/not_publish/request-datson-internal/v01`
&& `/request-datson-internal.html?taskId=105862&wcmmode=disabled`.
I hate to answer my own questions, but anyway, I found an own solution, via manually replacing those unicodes. It is similar to Sandra's idea, but able to convert ANY unicode.
I share it here, just in case, any person might also need it.
DATA: lt_res_tab TYPE match_result_tab.
DATA(valid_url) = url.
FIND ALL OCCURRENCES OF REGEX '\\u.{4}' IN valid_url RESULTS lt_res_tab.
WHILE lines( lt_res_tab ) > 0.
DATA(match) = substring( val = valid_url off = lt_res_tab[ 1 ]-offset len = lt_res_tab[ 1 ]-length ).
DATA(hex_unicode) = to_upper( match+2 ).
DATA(char) = cl_abap_conv_in_ce=>uccp( uccp = hex_unicode ).
valid_url = replace( val = valid_url off = lt_res_tab[ 1 ]-offset len = lt_res_tab[ 1 ]-length with = char ).
FIND ALL OCCURRENCES OF REGEX '\\u.{4}' IN valid_url RESULTS lt_res_tab.
ENDWHILE.
WRITE / url.
WRITE / valid_url.

Iterate Chinese string in Lua / Torch

I have a lua string in Chinese, such as
str = '这是一个中文字符串' -- in English: 'this is a Chinese string'
Now I would like to iterate the string above, to get the following result:
str[1] = '这'
str[2] = '是'
str[3] = '一'
str[4] = '个'
str[5] = '中'
str[6] = '文'
str[7] = '字'
str[8] = '符'
str[9] = '串'
and also output 9 for the length of the string.
Any ideas?
Something like this should work if you are using utf8 module from Lua 5.3 or luautf8, which works with LuaJIT:
local str = '这是一个中文字符串'
local tbl = {}
for p, c in utf8.codes(str) do
table.insert(tbl, utf8.char(c))
end
print(#tbl) -- prints 9
I haven't used non-english characters in lua before and my emulator just puts them in as '?' but something along the lines of this might work:
convert = function ( str )
local temp = {}
for c in str:gmatch('.') do
table.insert(temp, c)
end
return temp
end
This is a simple function that utilizes string.gmatch() to separate the string into individual characters and save them into a table. It would be used like this:
t = convert('abcd')
Which would make 't' a table containing a, b, c and d.
t[1] = a
t[2] = b
...
I am not sure if this will work for the Chinese characters but it is worth a shot.

Decompressing LZW in Lua [duplicate]

Here is the Pseudocode for Lempel-Ziv-Welch Compression.
pattern = get input character
while ( not end-of-file ) {
K = get input character
if ( <<pattern, K>> is NOT in
the string table ){
output the code for pattern
add <<pattern, K>> to the string table
pattern = K
}
else { pattern = <<pattern, K>> }
}
output the code for pattern
output EOF_CODE
I am trying to code this in Lua, but it is not really working. Here is the code I modeled after an LZW function in Python, but I am getting an "attempt to call a string value" error on line 8.
function compress(uncompressed)
local dict_size = 256
local dictionary = {}
w = ""
result = {}
for c in uncompressed do
-- while c is in the function compress
local wc = w + c
if dictionary[wc] == true then
w = wc
else
dictionary[w] = ""
-- Add wc to the dictionary.
dictionary[wc] = dict_size
dict_size = dict_size + 1
w = c
end
-- Output the code for w.
if w then
dictionary[w] = ""
end
end
return dictionary
end
compressed = compress('TOBEORNOTTOBEORTOBEORNOT')
print (compressed)
I would really like some help either getting my code to run, or helping me code the LZW compression in Lua. Thank you so much!
Assuming uncompressed is a string, you'll need to use something like this to iterate over it:
for i = 1, #uncompressed do
local c = string.sub(uncompressed, i, i)
-- etc
end
There's another issue on line 10; .. is used for string concatenation in Lua, so this line should be local wc = w .. c.
You may also want to read this with regard to the performance of string concatenation. Long story short, it's often more efficient to keep each element in a table and return it with table.concat().
You should also take a look here to download the source for a high-performance LZW compression algorithm in Lua...

Resources