How do I remove certain special characters from a string in Lua? - lua

Within Lua, how would I remove specific special characters from a string?
For example, a name input would be:
L#)iAm PAGE changed to Liam Page
José Luis changed to Jose Luis
JACK O'NIEL changed to Jack O'Niel
I currently have
firstName = ipFirstName:gsub('[%p%c%s]', '')
lastName = ipLastName:gsub('[%p%c%s]', '')
but it is too broad.

Below is a simple function for sanitizing names, to a certain degree:
local function sanitizeName (name)
local accented = {
['ß'] = 'ss'
, ['à'] = 'a', ['á'] = 'a', ['â'] = 'a', ['ã'] = 'a', ['å'] = 'a'
, ['ä'] = 'ae', ['æ'] = 'ae'
, ['ç'] = 'c'
, ['è'] = 'e', ['é'] = 'e', ['ê'] = 'e', ['ë'] = 'e'
, ['ì'] = 'i', ['í'] = 'i', ['î'] = 'i', ['ï'] = 'i'
, ['ð'] = 'dh'
, ['ñ'] = 'n'
, ['ò'] = 'o', ['ó'] = 'o', ['ô'] = 'o', ['õ'] = 'o', ['ø'] = 'o'
, ['ö'] = 'oe'
, ['ù'] = 'u', ['ú'] = 'u', ['û'] = 'u'
, ['ü'] = 'ue'
, ['ý'] = 'y', ['ÿ'] = 'y'
, ['þ'] = 'th'
}
local sanitized = name
:lower() -- Bring everything to lower case.
:gsub ('%s+', ' ') -- Normalise whitespaces.
-- Replace some non-ASCII characters:
for fancy, plain in pairs (accented) do
sanitized = sanitized:gsub (fancy, plain)
end
return sanitized
:gsub ("[^%a ']", '') -- Remove everyting but ASCII, spaces and apostrophes.
:gsub ('^%a', string.upper) -- Capitalise the first letter of the first name.
:gsub ("[ ']%a", string.upper) -- Capitalise the first letter of other names.
end
for _, name in ipairs {'L#)iAm PAGE', 'José Luis', "JACK O'NIEL"} do
print (name, sanitizeName (name))
end
However, to deal with Unicode character properly, study this page. Also note that most of assumptions about personal names are false.

Related

How can I shift all of the tables down after removing a table?

In this code:
t = {
num = '',
}
t[0].num = '0'
t[1].num = '1'
t[2].num = '2'
Is there a way for me to delete t[0], then shift all of the table's values down, so that afterword it looks like this:
t[0].num = '1'
t[1].num = '2'
Example with imaginary functions:
t = {
num = '',
}
t[0].num = '0'
t[1].num = '1'
t[2].num = '2'
for i=0,tableLength(t) do
print(t[i])
end
--Output: 012
remove(t[0])
for i=0,tableLength(t) do
print(t[i])
end
--Output: 12
t = {
num = '',
}
t[0].num = '0'
t[1].num = '1'
t[2].num = '2'
This code will cause errors for indexing t[0], a nil value.
t only has one field and that is t.num
You need to do something like this:
t = {}
for i = 0, 2 do
t[i] = {num = tostring(i)}
end
if you want to create the desired demo table.
As there are many useful functions in Lua that assume 1-based indexing you I'd recommend starting at index 1.
local t = {1,2,3,4,5}
Option 1:
table.remove(t, 1)
Option 2:
t = {table.unpack(t, 2, #t)}
Option 3:
t = table.move(t, 2, #t, 1, t)
t[#t] = nil
Option 4:
for i = 1, #t-1 do
t[i] = t[i+1]
end
t[#t] = nil
There are more options. I won't list them all. Some do it in place, some result in new table objects.
As stated in this answer, by creating a new table using the result of table.unpack:
t = {table.unpack(t, 1, #t)}

Luamacros for autohotkey

i'm currently using a lua script by taran van hemert,where it save every keypress on text file and autohotkey will detect every keypress based on that text file, and i'm just wondering how i can make shift work on other keys, i mean when i press shift+2 it will display "#" at the text file. does anyone know how to do it?
-- note that some of the code has changed since then (it works better now!)
-- Though, I have since abandoned luamacros, in favor of Interception... which i will abandon in favor of QMK.
-- get luamacros HERE: http://www.hidmacros.eu/forum/viewtopic.php?f=10&t=241#p794
-- plug in your 2nd keyboard, load this script into LUAmacros, and press the triangle PLAY button.
-- Then, press any key on that keyboard to assign logical name ('MACROS') to macro keyboard
clear() --clear the console from last run
local keyboardIdentifier = '0000AAA'
--You need to get the identifier code for the keyboard with name "MACROS"
--This appears about halfway through the SystemID item and looks like 1BB382AF or some other alphanumeric combo.
-- It's usually 7 or 8 characters long.
--Once you have this identifier, replace the value of keyboardIdentifier with it
--Don't ask for keyboard assignment help if the user has manually entered a keyboard identifier
if keyboardIdentifier == '0000AAA' then
lmc_assign_keyboard('MACROS');
else lmc_device_set_name('MACROS', keyboardIdentifier);
end
--This lists connected keyboards
dev = lmc_get_devices()
for key,value in pairs(dev) do
print(key..':')
for key2,value2 in pairs(value) do print(' '..key2..' = '..value2) end
end
print('You need to get the identifier code for the keyboard with name "MACROS"')
print('Then replace the first 0000AAA value in the code with it. This will prevent having to manually identify keyboard every time.')
-- Hide window to tray to keep taskbar tidy
lmc.minimizeToTray = true
--lmc_minimize()
--Start Script
sendToAHK = function (key)
--print('It was assigned string: ' .. key)
local file = io.open("C:\\Users\\Jhon Ryven\\Desktop\\2nd keyboard macros\\keypressed.txt", "w") -- writing this string to a text file on disk is probably NOT the best method. Feel free to program something better!
--If you didn't put your AutoHotKey scripts into C:/AHK, Make sure to substitute the path that leads to your own "keypressed.txt" file, using the double backslashes.
--print("we are inside the text file")
file:write(key)
file:flush() --"flush" means "save." Lol.
file:close()
lmc_send_keys('{F24}') -- This presses F24. Using the F24 key to trigger AutoHotKey is probably NOT the best method. Feel free to program something better!
end
local config = {
[45] = "insert",
[36] = "home",
[33] = "pageup",
[46] = "delete",
[35] = "end",
[34] = "pagedown",
[27] = "escape",
[112] = "F1",
[113] = "F2",
[114] = "F3",
[115] = "F4",
[116] = "F5",
[117] = "F6",
[118] = "F7",
[119] = "F8",
[120] = "F9",
[121] = "F10",
[122] = "F11",
[123] = "F12",
[8] = "backspace",
[220] = "backslash",
[13] = "enter",
[16] = "rShift",
[17] = "rCtrl",
[38] = "up",
[37] = "left",
[40] = "down",
[39] = "right",
[32] = "space",
[186] = "semicolon",
[222] = "singlequote",
[190] = "period",
[191] = "slash",
[188] = "comma",
[219] = "leftbracket",
[221] = "rightbracket",
[189] = "minus",
[187] = "equals",
[96] = "num0",
[97] = "num1",
[98] = "num2",
[99] = "num3",
[100] = "num4",
[101] = "num5",
[102] = "num6",
[103] = "num7",
[104] = "num8",
[105] = "num9",
[106] = "numMult",
[107] = "numPlus",
[108] = "numEnter", --sometimes this is different, check your keyboard
[109] = "numMinus",
[110] = "numDelete",
[111] = "numDiv",
[144] = "numLock", --probably it is best to avoid this key. I keep numlock ON, or it has unexpected effects
[192] = "Sc029", --this is the tilde key just before the number row
[9] = "tab",
[20] = "capslock",
[18] = "alt",
[91] = "winkey",
[string.byte('Q')] = "q",
[string.byte('W')] = "w",
[string.byte('E')] = "e",
[string.byte('R')] = "r",
[string.byte('T')] = "t",
[string.byte('Y')] = "y",
[string.byte('U')] = "u",
[string.byte('I')] = "i",
[string.byte('O')] = "o",
[string.byte('P')] = "p",
[string.byte('A')] = "a",
[string.byte('S')] = "s",
[string.byte('D')] = "d",
[string.byte('F')] = "f",
[string.byte('G')] = "g",
[string.byte('H')] = "h",
[string.byte('J')] = "j",
[string.byte('K')] = "k",
[string.byte('L')] = "l",
[string.byte('Z')] = "z",
[string.byte('X')] = "x",
[string.byte('C')] = "c",
[string.byte('V')] = "v",
[string.byte('B')] = "b",
[string.byte('N')] = "n",
[string.byte('M')] = "m",
[string.byte('0')] = "0",
[string.byte('1')] = "1",
[string.byte('2')] = "2",
[string.byte('3')] = "3",
[string.byte('4')] = "4",
[string.byte('5')] = "5",
[string.byte('6')] = "6",
[string.byte('7')] = "7",
[string.byte('8')] = "8",
[string.byte('9')] = "9",
[255+44] = "printscreen",
[145] = "scrolllock",
}
-- define callback for whole device
lmc_set_handler('MACROS', function(button, direction)
--Ignoring upstrokes ensures keystrokes are not registered twice, but activates faster than ignoring downstrokes. It also allows press and hold behaviour
if (direction == 0) then return end -- ignore key upstrokes.
if type(config[button]) == "string" then
print(' ')
print('Your key ID number is: ' .. button)
print('It was assigned string: ' .. config[button])
sendToAHK(config[button])
else
print(' ')
print('Not yet assigned: ' .. button)
end
end) ```
According to what I have found online there should be a 4th parameter flags that gives you information about modifier keys.
log_handler = function(button, direction, ts, flags)
print('Callback for device: button ' .. button .. ', direction '..direction..', ts '..ts..', flags '..flags)
end
Alternatively you remember if shift is pressed. You handle its downstroke so you know that it is pressed right?
Edit:
Modify the code like so, to get access to the flags parameter.
For further information I'd refer you to the Lua users manual. You cannot modify what you do not understand. Sorry.
lmc_set_handler('MACROS', function(button, direction, ts, flags)
print(flags)
--Ignoring upstrokes ensures keystrokes are not registered twice, but activates faster than ignoring downstrokes. It also allows press and hold behaviour
if (direction == 0) then return end -- ignore key upstrokes.
if type(config[button]) == "string" then
print(' ')
print('Your key ID number is: ' .. button)
print('It was assigned string: ' .. config[button])
sendToAHK(config[button])
else
print(' ')
print('Not yet assigned: ' .. button)
end
end)

How do I use Pytorch's "tanslation with a seq2seq" using my own inputs?

I am following the guide here
Currently this is the model:
SOS_token = 0
EOS_token = 1
class Lang:
def __init__(self, name):
self.name = name
self.word2index = {}
self.word2count = {}
self.index2word = {0: "SOS", 1: "EOS"}
self.n_words = 2 # Count SOS and EOS
def addSentence(self, sentence):
for word in sentence.split(' '):
self.addWord(word)
def addWord(self, word):
if word not in self.word2index:
self.word2index[word] = self.n_words
self.word2count[word] = 1
self.index2word[self.n_words] = word
self.n_words += 1
else:
self.word2count[word] += 1
def unicodeToAscii(s):
return ''.join(
c for c in unicodedata.normalize('NFD', s)
if unicodedata.category(c) != 'Mn'
)
# Lowercase, trim, and remove non-letter characters
def normalizeString(s):
s = unicodeToAscii(s.lower().strip())
s = re.sub(r"([.!?])", r" \1", s)
s = re.sub(r"[^a-zA-Z.!?]+", r" ", s)
return s
def readLangs(lang1, lang2, reverse=False):
print("Reading lines...")
# Read the file and split into lines
lines = open('Scribe/%s-%s.txt' % (lang1, lang2), encoding='utf-8').\
read().strip().split('\n')
# Split every line into pairs and normalize
pairs = [[normalizeString(s) for s in l.split('\t')] for l in lines]
# Reverse pairs, make Lang instances
if reverse:
pairs = [list(reversed(p)) for p in pairs]
input_lang = Lang(lang2)
output_lang = Lang(lang1)
else:
input_lang = Lang(lang1)
output_lang = Lang(lang2)
return input_lang, output_lang, pair
MAX_LENGTH = 5000
eng_prefixes = (
"i am ", "i m ",
"he is", "he s ",
"she is", "she s ",
"you are", "you re ",
"we are", "we re ",
"they are", "they re "
)
def filterPair(p):
return len(p[0].split(' ')) < MAX_LENGTH and \
len(p[1].split(' ')) < MAX_LENGTH and \
p[1].startswith(eng_prefixes)
def filterPairs(pairs):
return [pair for pair in pairs if filterPair(pair)]
def prepareData(lang1, lang2, reverse=False):
input_lang, output_lang, pairs = readLangs(lang1, lang2, reverse)
print("Read %s sentence pairs" % len(pairs))
pairs = filterPairs(pairs)
print("Trimmed to %s sentence pairs" % len(pairs))
print("Counting words...")
for pair in pairs:
input_lang.addSentence(pair[0])
output_lang.addSentence(pair[1])
print("Counted words:")
print(input_lang.name, input_lang.n_words)
print(output_lang.name, output_lang.n_words)
return input_lang, output_lang, pairs
The difference between what I'm trying to do and the guide is that I'm trying to insert my input languages as list of strings instead of reading them from a file:
pairs=['string one goes like this', 'string two goes like this']
input_lang = Lang(pairs[0][0])
output_lang = Lang(pairs[1][1])
But I it seems like when I try to count the number of words input_lang.n_words in my string I always get 2.
Is there something I'm missing in calling the class Lang?
Update:
I ran
language = Lang('english')
for sentence in pairs: language.addSentence(sentence)
print (language.n_words)
and that gave me the number of words in pairs
Though, that doesn't give me input_lang and output_lang like the guide did:
for pair in pairs:
input_lang.addSentence(pair[0])
output_lang.addSentence(pair[1])
So first of all you are initialising the Lang object with calls to pairs[0][0] and pairs[1][1] which is the same as Lang('s') and Lang('t')
The Lang object is supposed to be an object that stores information about a language so I would expect you need to only initialise it once with Lang('english') and then add the sentences from you dataset to the Lang object with the Lang.addSentence function.
Right now you aren't loading your dataset into the Lang object at all so when you want to know language.n_words it is just the initial value it gets when the object is created self.n_words = 2 # Count SOS and EOS
None of what you are doing in your question makes any sense, but I think what you want is the following:
language = Lang('english')
for sentence in pairs: language.addSentence(sentence)
print (language.n_words)

Pulling XML tags with Lua

I'm currently working on an XML parser and I'm trying to use Lua's pattern matching tools but I'm not getting the desired result. Let's say I have this XML snippet:
<Parent>
<Child>
<Details>Text in Parent tag and Details child tag</Details>
<Division>Text in Parent tag and Division child tag</Division>
</Child>
</Parent>
I need to pull the Parent tag out into a table, followed by any child tags, and their corresponding text data. I already have the pattern for pulling the data figured out:
DATA = "<.->(.-)<"
Likewise for pulling tags individually:
TAGS ="<(%w+)>"
However like I mentioned, I need to differentiate between tags that are nested and tags that aren't. Currently the pattern that's getting the closest result I need is:
CHILDTAG= "<%w->.-<(%w-)>"
Which should print only "Child" but it prints "Division" as well for a reason I can't comprehend. The idea behind the CHILDTAG pattern is it captures a tag IFF it had an enclosing tag, i.e , the ".-" is there to signify that it may/may not have a new line between it, however I think that's completely wrong because \n- doesn't work and that signifies a new line. I referred to the documentation and to:
https://www.fhug.org.uk/wiki/wiki/doku.php?id=plugins:understanding_lua_patterns
I use Lua 5.1. I want to parse an XML file of the following pattern. How should I go about it?
Lua XML extract from pattern
Simple XML parser (Named entities in XML are not supported)
local symbols = {lt = '<', gt = '>', amp = '&', quot = '"', apos = "'", nbsp = ' ', euro = '€', copy = '©', reg = '®'}
local function unicode_to_utf8(codepoint)
-- converts numeric unicode to string containing single UTF-8 character
local t, h = {}, 127
while codepoint > h do
local low6 = codepoint % 64
codepoint = (codepoint - low6) / 64
t[#t+1] = 128 + low6
h = 288067 % h
end
t[#t+1] = 254 - 2*h + codepoint
return string.char((table.unpack or unpack)(t)):reverse()
end
local function unescape(text)
return (
(text..'<![CDATA[]]>'):gsub('(.-)<!%[CDATA%[(.-)]]>',
function(not_cdata, cdata)
return
not_cdata
:gsub('%s', ' ')
--:gsub(' +', ' ') -- only for html
:gsub('^ +', '')
:gsub(' +$', '')
:gsub('&(%w+);', symbols)
:gsub('&#(%d+);', function(u) return unicode_to_utf8(to_number(u)) end)
:gsub('&#[xX](%x+);', function(u) return unicode_to_utf8(to_number(u, 16)) end)
..cdata
end
)
)
end
function parse_xml(xml)
local tag_stack = {}
local result = {find_child_by_tag = {}}
for text_before_tag, closer, tag, attrs, self_closer in xml
:gsub('^%s*<?xml.-?>', '') -- remove prolog
:gsub('^%s*<!DOCTYPE[^[>]+%[.-]>', '')
:gsub('^%s*<!DOCTYPE.->', '')
:gsub('<!%-%-.-%-%->', '') -- remove comments
:gmatch'([^<]*)<(/?)([%w_]+)(.-)(/?)>'
do
table.insert(result, unescape(text_before_tag))
if result[#result] == '' then
result[#result] = nil
end
if closer ~= '' then
local parent_pos, parent
repeat
parent_pos = table.remove(tag_stack)
if not parent_pos then
error("Closing unopened tag: "..tag)
end
parent = result[parent_pos]
until parent.tag == tag
local elems = parent.elems
for pos = parent_pos + 1, #result do
local child = result[pos]
table.insert(elems, child)
if type(child) == 'table' then
--child.find_parent = parent
parent.find_child_by_tag[child.tag] = child
end
result[pos] = nil
end
else
local attrs_dict = {}
for names, value in ('\0'..attrs:gsub('%s*=%s*([\'"])(.-)%1', '\0%2\0')..'\0')
:gsub('%z%Z*%z', function(unquoted) return unquoted:gsub('%s*=%s*([%w_]+)', '\0%1\0') end)
:gmatch'%z(%Z*)%z(%Z*)'
do
local last_attr_name
for name in names:gmatch'[%w_]+' do
name = unescape(name)
if last_attr_name then
attrs_dict[last_attr_name] = '' -- boolean attributes (such as "disabled" in html) are converted to empty strings
end
last_attr_name = name
end
if last_attr_name then
attrs_dict[last_attr_name] = unescape(value)
end
end
table.insert(result, {tag = tag, attrs = attrs_dict, elems = {}, find_child_by_tag = {}})
if self_closer == '' then
table.insert(tag_stack, #result)
end
end
end
for _, child in ipairs(result) do
if type(child) == 'table' then
result.find_child_by_tag[child.tag] = child
end
end
-- Now result is a sequence of upper-level tags
-- each tag is a table containing fields: tag (string), attrs (dictionary, may be empty), elems (array, may be empty) and find_child_by_tag (dictionary, may be empty)
-- attrs is a dictionary of attributes
-- elems is a sequence of elements (with preserving their order): tables (nested tags) or strings (text between <tag> and </tag>)
return result
end
Usage example:
local xml= [[
<Parent>
<Child>
<Details>Text in Parent tag and Details child tag</Details>
<Division>Text in Parent tag and Division child tag</Division>
</Child>
</Parent>
]]
xml = parse_xml(xml)
--> both these lines print "Text in Parent tag and Division child tag"
print(xml[1].elems[1].elems[2].elems[1])
print(xml.find_child_by_tag.Parent.find_child_by_tag.Child.find_child_by_tag.Division.elems[1])
What parsed xml looks like:
xml = {
find_child_by_tag = {Parent = ...},
[1] = {
tag = "Parent",
attrs = {},
find_child_by_tag = {Child = ...},
elems = {
[1] = {
tag = "Child",
attrs = {},
find_child_by_tag = {Details = ..., Division = ...},
elems = {
[1] = {
tag = "Details",
attrs = {},
find_child_by_tag = {},
elems = {[1] = "Text in Parent tag and Details child tag"}
},
[2] = {
tag = "Division",
attrs = {},
find_child_by_tag = {},
elems = {[1] = "Text in Parent tag and Division child tag"}
}
}
}
}
}
}

string replacement in Ruby: greentexting support for imageboard

I'm trying to have greentext support for my Rails imageboard (though it should be mentioned that this is strictly a Ruby problem, not a Rails problem)
basically, what my code does is:
1. chop up a post, line by line
2. look at the first character of each line. if it's a ">", start the greentexting
3. at the end of the line, close the greentexting
4. piece the lines back together
My code looks like this:
def filter_comment(c) #use for both OP's and comments
c1 = c.content
str1 = '<p class = "unkfunc">' #open greentext
str2 = '</p>' #close greentext
if c1 != nil
arr_lines = c1.split('\n') #split the text into lines
arr_lines.each do |a|
if a[0] == ">"
a.insert(0, str1) #add the greentext tag
a << str2 #close the greentext tag
end
end
c1 = ""
arr_lines.each do |a|
strtmp = '\n'
if arr_lines.index(a) == (arr_lines.size - 1) #recombine the lines into text
strtmp = ""
end
c1 += a + strtmp
end
c2 = c1.gsub("\n", '<br/>').html_safe
end
But for some reason, it isn't working! I'm having weird things where greentexting only works on the first line, and if you have greentext on the first line, normal text doesn't work on the second line!
Side note, may be your problem, without getting too in depth...
Try joining your array back together with join()
c1 = arr_lines.join('\n')
I think the problem lies with the spliting the lines in array.
names = "Alice \n Bob \n Eve"
names_a = names.split('\n')
=> ["Alice \n Bob \n Eve"]
Note the the string was not splited when \n was encountered.
Now lets try this
names = "Alice \n Bob \n Eve"
names_a = names.split(/\n/)
=> ["Alice ", " Bob ", " Eve"]
or This "\n" in double quotes. (thanks to Eric's Comment)
names = "Alice \n Bob \n Eve"
names_a = names.split("\n")
=> ["Alice ", " Bob ", " Eve"]
This got split in array. now you can check and append the data you want
May be this is what you want.
def filter_comment(c) #use for both OP's and comments
c1 = c.content
str1 = '<p class = "unkfunc">' #open greentext
str2 = '</p>' #close greentext
if c1 != nil
arr_lines = c1.split(/\n/) #split the text into lines
arr_lines.each do |a|
if a[0] == ">"
a.insert(0, str1) #add the greentext tag
# Use a.insert id you want the existing ">" appended to it <p class = "unkfunc">>
# Or else just assign a[0] = str1
a << str2 #close the greentext tag
end
end
c1 = arr_lines.join('<br/>')
c2 = c1.html_safe
end
Hope this helps..!!
I'm suspecting that your problem is with your CSS (or maybe HTML), not the Ruby. Did the resulting HTML look correct to you?

Resources