Vim : getline and non ascii character - character-encoding

I use a single function (that a create) to know if the character next to the current cursor position is a space
function Test_caractere_suivant_espace()
"Test si le caractère suivant est une espace"
let position = getcurpos()
let ligne = getline(position[1])
let car_suivant = ligne[position[2]]
if car_suivant == ' '
return 1
else
return 0
endfunction
It work well… but only with Ascii characters, not with not Ascii characters in UTF-8.
Of course, I could try the value of the two first bits of the current character, but is there anyway to have UTF-8 characters in the array returned by getline and not a list of one-byte values?
A pist of solution
DJMcMayhem suggest a solution using
let ligne = split(getline(position[1]), '\zs')
But there is still a problem to determine the next character.
Here is the new version of the function
function Test_caractere_suivant_espace()
"Test si le caractère suivant est une espace"
let position = getcurpos()
let ligne = split(getline(position[1]), '\zs')
let car_suivant = ligne[position[2]]
echom car_suivant
if car_suivant == ' '
return 1
else
return 0
endfunction
In this line
α α α α α α α α α α
If I call the function in the before last α, I get
Error detected while processing function LB_content[2]..Test_caractere_suivant_espace:
line 4:
E684: list index out of range: 25
E15: Invalid expression: ligne[position[2]]
line 5:
E121: Undefined variable: car_suivant
E15: Invalid expression: car_suivant
line 6:
E121: Undefined variable: car_suivant
E15: Invalid expression: car_suivant == ' '

Unfortunately, there's no way to do this that I am aware of. :h getline mentions nothing about encoding options, and from this vim mailing list I found, it seems like this problem has been around for a while with no fix.
However, I did figure out a hacky workaround. Instead of working with strings, you can work with a list of characters instead. Indexing into that will give you whole characters instead of individual bytes. Try this:
let ligne = split(getline(position[1]), '\zs')

Related

regex for matching a string into words but leaving multiple spaces

Here's what I expect. I have a string with numbers that need to be changed into letters (a kind of cipher) and spaces to move into different letter, and there is a tripple spaces that represent a space in output. For example, a string "394 29 44 44 141 6" will be decrypted into "Hell No".
function string.decrypt(self)
local output = ""
for i in self:gmatch("%S+") do
for j, k in pairs(CODE) do
output = output .. (i == j and k or "")
end
end
return output
end
Even though it decrypts the numbers correctly I doesn't work with spacebars. So the string I used above decrypts into "HellNo", instead of expected "Hell No". How can I fix this?
You can use
CODE = {["394"] = "H", ["29"] = "e", ["44"] = "l", ["141"] = "N", ["6"] = "o"}
function replace(match)
local ret = nil
for i, v in pairs(CODE) do
if i == match then
ret = v
end
end
return ret
end
function decrypt(s)
return s:gsub("(%d+)%s?", replace):gsub(" ", " ")
end
print (decrypt("394 29 44 44 141 6"))
Output will contain Hell No. See the Lua demo online.
Here, (%d+)%s? in s:gsub("(%d+)%s?", replace) matches and captures one or more digits and just matches an optional whitespace (with %s?) and the captured value is passed to the replace function, where it is mapped to the char value in CODE. Then, all double spaces are replaced with a single space with gsub(" ", " ").

String formatting with unicode characters using Lua

I trying to align string with unicode characters.
But it doesn't works.
Spaces is not correct. :(
Lua's version is 5.1.
What is the problem?
local t =
{
"character",
"루아", -- korean
"abc감사합니다123", -- korean
"ab23",
"lua is funny",
"ㅇㅅㅇ",
"美國大將", --chinese
"qwert-54321",
};
for k, v in pairs(t) do
print(string.format("%30s", v));
end
result:----------------------------------------------
character
루아
abc감사합니다123
ab23
lua is funny
ㅇㅅㅇ
美國大將
qwert-54321
function utf8format(fmt, ...)
local args, strings, pos = {...}, {}, 0
for spec in fmt:gmatch'%%.-([%a%%])' do
pos = pos + 1
local s = args[pos]
if spec == 's' and type(s) == 'string' and s ~= '' then
table.insert(strings, s)
args[pos] = '\1'..('\2'):rep(#s:gsub("[\128-\191]", "")-1)
end
end
return (fmt:format((table.unpack or unpack)(args))
:gsub('\1\2*', function() return table.remove(strings, 1) end)
)
end
local t =
{
"character",
"루아", -- korean
"abc감사합니다123", -- korean
"ab23",
"lua is funny",
"ㅇㅅㅇ",
"美國大將", --chinese
"qwert-54321",
"∞"
};
for k, v in pairs(t) do
print(utf8format("%30s", v));
end
But there is another problem: on most fonts korean and chinese symbols are wider than latin letters.
The ASCII strings are all formatted correctly, while the non-ASCII strings are not.
The reason is because, the length of the strings are counted with their number of bytes. For instance, with UTF-8 encodings,
print(string.len("美國大將")) -- 12
print(string.len("루아")) -- 6
So %s in string.format treat these two strings as if their width is 12 / 6.

Lua find operand in a string

I have a Lua string like "382+323" or "32x291" or "94-23", how can I check and return the position of the operands?
I found String.find(s, "[+x-]") did not work. Any ideas?
th> str = '5+3'
th> string.find(str, '[+-x]')
1 1
th> string.find(str, '[+x-]')
2 2
[+-x] is a pattern match for 1 character in the range between "+" and "x".
When you want to use dash as character and not as the meta character you should start or end the character group with it.
print("Type an arithmetic expression, such as 382 x 3 / 15")
expr = io.read()
i = -1
while i do
-- Find the next operator, starting from the position of the previous one.
-- The signals + and - are special characters,
-- so you have to use the % char to escape each one.
-- [The find function returns the indices of s where this occurrence starts and ends][1].
-- Here we are obtaining just the start index.
i = expr:find("[%+x%-/]", i+1)
if i then
print("Operator", expr:sub(i, i), "at position", i)
end
end

Parsing an input file which contains polynomials

Hello experienced pythoners.
The goal is simply to read in my own files which have the following format, and to then apply mathematical operations to these values and polynomials. The files have the following format:
m1:=10:
m2:=30:
Z1:=1:
Z2:=-1:
...
Some very similar variables, next come the laguerre polynomials
...
F:= (12.58295)*L(0,x)*L(1,y)*L(6,z) + (30.19372)*L(0,x)*L(2,y)*L(2,z) - ...:
Where L stands for a laguerre polynomial and takes two arguments.
I have written a procedure in Python which splits apart each line into a left and right hand side split using the "=" character as a divider. The format of these files is always the same, but the number of laguerre polynomials in F can vary.
import re
linestring = open("file.txt", "r").read()
linestring = re.sub("\n\n","\n",str(linestring))
linestring = re.sub(",\n",",",linestring)
linestring = re.sub("\\+\n","+",linestring)
linestring = re.sub(":=\n",":=",linestring)
linestring = re.sub(":\n","\n",linestring)
linestring = re.sub(":","",linestring)
LINES = linestring.split("\n")
for LINE in LINES:
LINE = re.sub(" ","",LINE)
print "LINE=", LINE
if len(LINE) <=0:
next
PAIR = LINE.split("=")
print "PAIR=", PAIR
LHS = PAIR[0]
RHS = PAIR[1]
print "LHS=", LHS
print "RHS=", RHS
The first re.sub block just deals with formatting the file and discarding characters that python will not be able to process; then a loop is performed to print 4 things, LINE, PAIR, LHS and RHS, and it does this nicely. using the example file from above the procedure will print the following:
LINE= m1=1
PAIR= ['m1', '1']
LHS= m1
RHS= 1
LINE= m2=1
PAIR= ['m2', '1']
LHS= m2
RHS= 1
LINE= Z1=-1
PAIR= ['Z1', '-1']
LHS= Z1
RHS= -1
LINE= Z2=-1
PAIR= ['Z2', '-1']
LHS= Z2
RHS= -1
LINE= F= 12.5*L(0,x)L(1,y) + 30*L(0,x)L(2,y)L(2,z)
PAIR=['F', '12.5*L(0,x)L(1,y) + 30*L(0,x)L(2,y)L(2,z)']
LHS= F
RHS= 12.5*L(0,x)L(1,y) + 30*L(0,x)L(2,y)L(2,z)
My question is what is the next best step to process this output and use it in a mathematical script, especially assigning the L to mean a laguerre polynomial? I tried putting the LHS and RHS into a dictionary, but found it troublesome to put F in it due to the laguerre polynomials.
Any ideas are welcome. Perhaps I am overcomplicating this and there is a much simpler way to parse this file.
Many thanks in advance
Your parsing algorithm doesn't seem to work correctly, as the RHS of your variables dont produce the expected result.
Also the first re.sub block where you want to format the file seems overly complicated. Assuming every statement in your input file is terminated by a colon, you could get rid of all whitespace and newlines and seperate the statements using the following code:
linestring = open('file.txt','r').read()
strippedstring = linestring.replace('\n','').replace(' ','')
statements = re.split(':(?!=)',strippedstring)[:-1]
Then you iterate over the statements and split each one in LHS and RHS:
for st in statements:
lhs,rhs = re.split(':=',st)
print 'lhs=',lhs
print 'rhs=',rhs
In the next step, try to distinguish normal float variables and polynomials:
#evaluate rhs
try:
#interpret as numeric constant
f = float(rhs)
print " ",f
except ValueError:
#interpret as laguerre-polynomial
summands = re.split('\+', re.sub('-','+-',rhs))
for s in summands:
m = re.match("^(?P<factor>-?[0-9]*(\.[0-9]*)?)(?P<poly>(\*?L\([0-9]+,[a-z]\))*)", s)
if not m:
print ' polynomial misformatted'
continue
f = m.group('factor')
print ' factor: ',f
p = m.group('poly')
for l in re.finditer("L\((?P<a>[0-9]+),(?P<b>[a-z])\)",p):
print ' poly: L(%s,%s)' % (l.group("a"),l.group("b"))
This should work for your given example file.

SET A, 0x1E vs SET A, 0x1F

This is my first attempt at dpcu, I'm checking machine code generated by dpcu-16 assembly
I am using this emulator : http://dcpu.ru/
I am trying to compare code generated by
SET A, 0x1E
SET A, 0x1F
code generated is as follow :
fc01
7c01 001f
I don't get why operand size changes between those two values
That emulator appears to be using the next version of the DCPU-16 spec, which specifies that the same-word literal value for a permits values from 0xFFFF (-1) to 0x1E (30). This means that to get any literal value outside this range the assembler has to use the next-word literal syntax, which makes the operand one byte bigger.
0x1F (dec:31) is no longer a short literal (values -1 to 30), so it has to be read as a "next word" argument.
The opcodes are thus:
SET A, 0x1E
SET = 00001
A = 00000
1E = 111111
op = 1111110000000001 = fc01
SET A, 0x1F
SET = 00001
A = 00000
NW = 011111
op = 0111110000000001 = 7c01 + 001f

Resources