I am trying to convert a text source into an HTML readable page.
The code I have have tried:
local newstr=string.gsub(str,"±", "±")
local newstr=string.gsub(str,"%±", "±")
However, the character shows up as  in the output.
I can't seem to find any other documentation on how to handle this specific special character. How do I handle this character when reading in so that it will output properly?
Edit: After trying suggestions I'm able to determine this:
local function sanitizeheader(str)
if not(str)then return "" end
str2 = "Depth ±"
local newstr=string.gsub(str2, string.char(177), "±")
return newstr
end
In the testing, if I use str2 ± does show up in the output. However, when I try to use str as it is passed in from reading the excel file, it doesn't pick up the character and still returns the  character.
Lua string assume strings as sequence of bytes. You are trying utf8 multi byte character. The code you are trying should work as it just replacing a sequence of bytes. However, Lua 5.3 has utf8 library to handle unicode character
local str="±®ª"
for code in str:gmatch(utf8.charpattern) do
print("&#" .. utf8.codepoint(code) .. ";")
end
Output:
±
®
ª
Check Lua Reference Manual for more info.
Related
Probably it's an easy thing, but I'm a Lua beginner...
I'm creating a very simple QSC QSYS plugin to control a projection server using KVL API. Server API is based on hex strings.
For example this command asks the server to load a the playlist with 9bf5455689ed4c019731c6dd3c071f0e uuid:
Controls["LoadSPL"].EventHandler = function()
sock:Write(
"\x06\x0e\x2b\x34\x02\x05\x01\x0a\x0e\x10\x01\x01\x01\x03\x09\x00\x83\x00\x00\x14\x00\x00\x00\x01\x9b\xf5\x45\x56\x89\xed\x4c\x01\x97\x31\xc6\xdd\x3c\x07\x1f\x0e"
)
end
Now I need to be able to create a string with a variable UUID, according to the text indicated in a textbox (or a list of available UUIDs read from the server) in the user interface.
I will concatenate this string to the fixed part of the command.
How can I correctly make a string like
ad17fc696b49454db17d593db3e553e5 become
\xad\x17\xfc\x69\x6b\x49\x45\x4d\xb1\x7d\x59\x3d\xb3\xe5\x53\xe5?
Try this:
local input = "ad17fc696b49454db17d593db3e553e5"
local output = input:gsub("%w%w", function(s) return string.char(tonumber(s, 16)) end)
Explanation: this takes every pair of characters, interprets them as base 16 numeric string, and then takes the character with that number, and uses that to replace the original characters.
EDIT: To make it clear what's going on, and why the other answers are wrong, backslash escape sequences like \xad are a feature of the Lua source code, in memory it's represented by a byte with value 173, just like A is represented by a byte with value 65. Trying to concatenate a literal backslash character with hexadecimal characters does not create an escape code. So the way to do that is manually with string.char.
#! /usr/bin/env lua
str = 'ad17fc696b49454db17d593db3e553e5'
strx = ''
for i = 1, #str, 2 do -- loop through every-other position in your string
chars = str :sub( i, i+1 ) -- capture every 2 chars
strx = strx ..'\\x' ..chars
end -- append a literal backslash, the letter x, then those 2 chars
target = [[\xad\x17\xfc\x69\x6b\x49\x45\x4d\xb1\x7d\x59\x3d\xb3\xe5\x53\xe5]]
print( x, x == target ) -- print results, and test if it meets expected target
\xad\x17\xfc\x69\x6b\x49\x45\x4d\xb1\x7d\x59\x3d\xb3\xe5\x53\xe5 true
This can be code-golfed into a one-liner
x=''for i=1,#s,2 do x=x..'\\x'..s:sub(i,i+1)end
In Ruby 1.9.3-429, I am trying to parse plain text files with various encodings that will ultimately be converted to UTF-8 strings. Non-ascii characters work fine with a file encoded as UTF-8, but problems come up with non-UTF-8 files.
Simplified example:
File.open(file) do |io|
io.set_encoding("#{charset.upcase}:#{Encoding::UTF_8}")
line, char = "", nil
until io.eof? || char == ?\n || char == ?\r
char = io.readchar
puts "Character #{char} has #{char.each_codepoint.count} codepoints"
puts "SLICE FAIL" unless char == char.slice(0,1)
line << char
end
line
end
Both files are just a single string áÁð encoded appropriately. I have checked that the files have been encoded correctly via $ file -i <file_name>
With a UTF-8 file, I get back:
Character á has 1 codepoints
Character Á has 1 codepoints
Character ð has 1 codepoints
With an ISO-8859-1 file:
Character á has 2 codepoints
SLICE FAIL
Character Á has 2 codepoints
SLICE FAIL
Character ð has 2 codepoints
SLICE FAIL
The way I am interpreting this is readchar is returning an incorrectly converted encoding which is causing slice to return incorrectly.
Is this behavior correct? Or am I specifying the file external encoding incorrectly? I would rather not rewrite this process so I am hoping I am making a mistake somewhere. There are reasons why I am parsing files this way, but I don't think those are relevant to my question. Specifying the internal and external encoding as an option in File.open yielded the same results.
This behavior is a bug. See http://bugs.ruby-lang.org/issues/8516 for details.
I read a file:
local logfile = io.open("log.txt", "r")
data = logfile:read("*a")
print(data)
output:
...
"(\.)\n(\w)", r"\1 \2"
"\n[^\t]", "", x, re.S
...
Yes, logfile looks awful as it's full of various commands
How can I call gsub and remove i.e. "(\.)\n(\w)", r"\1 \2" line from data variable?
Below snippet, does not work:
s='"(\.)\n(\w)", r"\1 \2"'
data=data:gsub(s, '')
I guess some escaping needs to be done. Any easy solution?
Update:
local data = [["(\.)\n(\w)", r"\1 \2"
"\n[^\t]", "", x, re.S]]
local s = [["(\.)\n(\w)", r"\1 \2"]]
local function esc(x)
return (x:gsub('%%', '%%%%')
:gsub('^%^', '%%^')
:gsub('%$$', '%%$')
:gsub('%(', '%%(')
:gsub('%)', '%%)')
:gsub('%.', '%%.')
:gsub('%[', '%%[')
:gsub('%]', '%%]')
:gsub('%*', '%%*')
:gsub('%+', '%%+')
:gsub('%-', '%%-')
:gsub('%?', '%%?'))
end
print(data:gsub(esc(s), ''))
This seems to works fine, only that I need to escape, escape character %, as it wont work if % is in matched string. I tried :gsub('%%', '%%%%') or :gsub('\%', '\%\%') but it doesn't work.
Update 2:
OK, % can be escaped this way if set first in above "table" which I just corrected
:terrible experience:
Update 3:
Escaping of ^ and $
As stated in Lua manual (5.1, 5.2, 5.3)
A caret ^ at the beginning of a pattern anchors the match at the beginning of the subject string. A $ at the end of a pattern anchors the match at the end of the subject string. At other positions, ^ and $ have no special meaning and represent themselves.
So a better idea would be to escape ^ and $ only when they are found (respectively) and the beginning or the end of the string.
Lua 5.1 - 5.2+ incompatibilities
string.gsub now raises an error if the replacement string contains a % followed by a character other than the permitted % or digit.
There is no need to double every % in the replacement string. See lua-users.
According to Programming in Lua:
The character `%´ works as an escape for those magic characters. So, '%.' matches a dot; '%%' matches the character `%´ itself. You can use the escape `%´ not only for the magic characters, but also for all other non-alphanumeric characters. When in doubt, play safe and put an escape.
Doesn't this mean that you can simply put % in front of every non alphanumeric character and be fine. This would also be future proof (in the case that new special characters are introduced). Like this:
function escape_pattern(text)
return text:gsub("([^%w])", "%%%1")
end
It worked for me on Lua 5.3.2 (only rudimentary testing was performed). Not sure if it will work with older versions.
Why not:
local quotepattern = '(['..("%^$().[]*+-?"):gsub("(.)", "%%%1")..'])'
string.quote = function(str)
return str:gsub(quotepattern, "%%%1")
end
to escape and then gsub it away?
try
line = '"(\.)\n(\w)", r"\1 \2"'
rx = '\"%(%\.%)%\n%(%\w%)\", r\"%\1 %\2\"'
print(string.gsub(line, rx, ""))
escape special characters with %, and quotes with \
Try s=[["(\.)\n(\w)", r"\1 \2"]].
Use stringx.replace() from Penlight Lua Libraries instead.
Reference: https://stevedonovan.github.io/Penlight/api/libraries/pl.stringx.html#replace
Implementation (v1.12.0): https://github.com/lunarmodules/Penlight/blob/1.12.0/lua/pl/stringx.lua#L288
Based on their implementation:
function escape(s)
return (s:gsub('[%-%.%+%[%]%(%)%$%^%%%?%*]','%%%1'))
end
function replace(s,old,new,n)
return (gsub(s,escape(old),new:gsub('%%','%%%%'),n))
end
I'm encrypting my Lua code with this script.
local script = string.dump(
function()
local function h4x(strtbl)
buffer=""
for v in strtbl do
buffer=buffer..strtbl[v]
end
return buffer
end
print("encrypted")
end
)
buff=""
for v=1,string.len(script) do --Convert our string into a hex string.
buff=buff..'\\'..string.byte(script,v)
end
file=io.open('encrypted.txt','w') --Output our bytecode into ascii format to encrypted.txt
file:write(buff)
file:flush()
file:close()
The output of encrypted.txt is like "00/12/46/4/2/6/4/62/". How do I decrypt bytecode?
This text is not encrypted. It's just Lua bytecode in hexadecimal.
Discussion of means of disassembling this bytecode into human-readable opcodes is in another question: Lua equivalent to Python dis()?
Obviously its printing out each BYTE as a value (which is decimal, even though its stated its converted to hex) delimited by a '/'.
All you need to do then is fill an array using the bytes you pull from the string, using tonumber to convert them back to their byte value. this will help with parsing the formatted output
local a = "te\st"
local b = string.gsub(a,'\','\\\\')
assert(false,b)
What am I doing wrong?
When I do assert, I want that to the screen the string te\st will be printed... but it's not working
I have a JSON file, that I want to decode it into Lua table. I don't need to print out nothing, I did the assert just to test a local problem.
So what I need is to keep all data in the JSON file that has '\'.
Use [[]] instead of "" or '' if you don't want backslash to have special meaning.
Read about literal strings in the manual.
Have you tried escaping it with the % character instead of \
I don't know if this will help, but I was having a HELL of a time making Lua's gsub match my string with special characters in it that I wanted treated literally... it turned out that instead of using \ as an escape character, or doubling the character, that I needed to prefix the special character with % to make it be treated literally.
Your question wasn't too clear so I'm not 100% sure what you mean. Do you mean that you want the assert to fire when b is equal to the string "te\st"? If so you can do a simple:
assert(b ~= "te\st")
Or I suppse...
assert(b ~= a)
You don't need the gsub. But here it is anyways.
local a = "te\\st"
local b = string.gsub(a,'\\','\\')
assert(false,b)