I am constructing a general purpose function to read a text file, which may be Ascii, UTF-8 or UTF-16. (The encoding is known when the function is invoked). The file name may contain UTF8 characters, so the standard lua io functions are not a solution. I have no control over the Lua implementation (5.3) or the binary modules available in the environment.
My current code is:
require "luacom"
local function readTextFile(sPath, bUnicode, iBits)
local fso = luacom.CreateObject("Scripting.FileSystemObject")
if not fso:FileExists(sPath) then return false, "" end --check the file exists
local so = luacom.CreateObject("ADODB.Stream")
--so.CharSet defaults to Unicode aka utf-16
--so.Type defaults to text
so.Mode = 1 --adModeRead
if not bUnicode then
so.CharSet = "ascii"
elseif iBits == 8 then
so.CharSet = "utf-8"
end
so:Open()
so:LoadFromFile(sPath)
local contents = so:ReadText()
so:Close()
return true, contents
end
--test Unicode(utf-16) files
local file = "D:\\OneDrive\\Desktop\\utf16.txt" --this exists
local booOK, factsetcontents = readTextFile(file, true, 16)
When executed I get the error: COM exception:(d:\my\lua\luacom-master\src\library\tluacom.cpp,382):Operation is not allowed in this context on line 19 [local stream = so:LoadFromFile(sPath)]
I've pored over the ADO documentation and am obviously missing something that is staring me in the face! Is what I'm trying to do impossible?
ETA: If I comment out the line so.Mode = 1, this works. Which is great, but I don't understand why, which meaans I may end up making the same mistake unwittingly, whatever that mistake is!
I don't know about AdoDB Stream.Mode and why the function failed. But I think it's rather tricky to use a ADODB COM object on Windows to read ASCII/UTF8/UNICODE encoded files.
You can instead :
use standard Lua io.open function in binary mode and use manual decoding of the bytes content
use a binary module to do all the work
use a specific Lua implementation for Windows that can read/write those kind of encoded files natively, like LuaRT
Related
I'm using Lua in Scite on Windows, but hopefully this is a general Lua question.
Let's say I want to write a temporary string content to a temporary file in Lua - which I want to be eventually read by another program, - and I tried using io.tmpfile():
mytmpfile = assert( io.tmpfile() )
mytmpfile:write( MYTMPTEXT )
mytmpfile:seek("set", 0) -- back to start
print("mytmpfile" .. mytmpfile .. "<<<")
mytmpfile:close()
I like io.tmpfile() because it is noted in https://www.lua.org/pil/21.3.html :
The tmpfile function returns a handle for a temporary file, open in read/write mode. That file is automatically removed (deleted) when your program ends.
However, when I try to print mytmpfile, I get:
C:\Users\ME/sciteLuaFunctions.lua:956: attempt to concatenate a FILE* value (global 'mytmpfile')
>Lua: error occurred while processing command
I got the explanation for that here Re: path for io.tmpfile() ?:
how do I get the path used to generate the temp file created by io.tmpfile()
You can't. The whole point of tmpfile is to give you a file handle without
giving you the file name to avoid race conditions.
And indeed, on some OSes, the file has no name.
So, it will not be possible for me to use the filename of the tmpfile in a command line that should be ran by the OS, as in:
f = io.popen("python myprog.py " .. mytmpfile)
So my questions are:
Would it be somehow possible to specify this tmpfile file handle as the input argument for the externally ran program/script, say in io.popen - instead of using the (non-existing) tmpfile filename?
If above is not possible, what is the next best option (in terms of not having to maintain it, i.e. not having to remember to delete the file) for opening a temporary file in Lua?
You can get a temp filename with os.tmpname.
local n = os.tmpname()
local f = io.open(n, 'w+b')
f:write(....)
f:close()
os.remove(n)
If your purpose is sending some data to a python script, you can also use 'w' mode in popen.
--lua
local f = io.popen(prog, 'w')
f:write(....)
#python
import sys
data = sys.stdin.readline()
I am learning the Lua IO library. I'm having trouble with io.write(). In Programming Design in Lua, there is a piece of code that iterates through the file line by line and precedes each line with a serial number.
This is the file I`m working on:
test file: "iotest.txt"
This is my code
io.input("iotest.txt")
-- io.output("iotest.txt")
local count = 0
for line in io.lines() do
count=count+1
io.write(string.format("%6d ",count), line, "\n")
end
This is the result of the terminal display, but this result cannot be written to the file, whether I add IO. Output (" iotest.txt ") or not.
the results in terminal
This is the result of file, we can see there is no change
The result after code running
Just add io.flush() after your write operations to save the data to the file.
io.input("iotest.txt")
io.output("iotestout.txt")
local count = 0
for line in io.lines() do
count=count+1
io.write(string.format("%6d ",count), line, "\n")
end
io.flush()
io.close()
Refer to Lua 5.4 Reference Manual : 6.8 - Input and Output Facilities
io.flush() will save any written data to the output file which you set with io.output
See koyaanisqatsi's answer for the optional use of file handles. This becomes especially useful if you're working on multiple files at a time and gives you more control on how to interact with the file.
That said you should also have different files for input and output. You'll agree that it doesn't make sense to read and write from and to the same file alternatingly.
For writing to a file you need a file handle.
This handle comes from: io.open()
See: https://www.lua.org/manual/5.4/manual.html#6.8
A file handle has methods that acts on self.
Thats the function after the : at file handle.
So io.write() puts out on stdout and file:write() in a file.
Example function that can dump a defined function to a file...
fdump=function(func,path)
assert(type(func)=="function")
assert(type(path)=="string")
-- Get the file handle (file)
local file,err = io.open(path, "wb")
assert(file, err)
local chunk = string.dump(func,true)
file:write(chunk)
file:flush()
file:close()
return 'DONE'
end
Here are the methods, taken from io.stdin
close = function: 0x566032b0
seek = function: 0x566045f0
flush = function: 0x56603d10
setvbuf = function: 0x56604240
write = function: 0x56603e70
lines = function: 0x566040c0
read = function: 0x56603c90
This makes it able to use it directly like...
( Lua console: lua -i )
> do io.stdout:write('Input: ') local result=io.stdin:read() return result end
Input: d
d
You are trying to open the same file for reading and writing at the same time. You cannot do that.
There are two possible solutions:
Read from file X, iterate through it and write the result to another file Y.
Read the complete file X into memory, close file X, then delete file X, open the same filename for writing and write to it while iterating through the original file (in memory).
Otherwise, your approach is correct although file operations in Lua are more often done using io.open() and file handles instead of io.write() and io.read().
I am using the iconv C API and I want iconv to detect the local encoding of the computer. Is that possible? Apparently it is because when I look in the source code, I find in the file iconv_open1.h that if the fromcode or tocode variables are empty strings ("") then the local encoding is used using the locale_charset() function call.
Someone also told me that in order to convert the locale encoding to unicode, all I needed was to use iconv_open ("UTF-8", "")
Unfortunately, I find no mention of this in the documentation.
And when I convert some iso-8859-1 text to the locale encoding (which is utf-8 on my machine), then during conversion I get errno=EILSEQ (illegal sequence). I checked and iconv_open returned no error.
If instead of the empty string in iconv_open I specify "utf-8", then I get no error. Obviously iconv failed to detect my current charset.
edit: I checked with a simple C program that puts(nl_langinfo(CODESET)) and I get ANSI_X3.4-1968 (which is ASCII). Apparently, I got a problem with charset detection.
edit: this should be related to Why is nl_langinfo(CODESET) different from locale charmap?
additional information: my program is written in Ada, and I bind at link-time to C functions. Apparently, the locale setting is not initialized the same way in the Ada runtime and C runtime.
I'll take the same answer as in Why is nl_langinfo(CODESET) different from locale charmap?
You need to first call
setlocale(LC_ALL, "");
I read an Excel 2003 file with a text editor to see some markup language.
When I open the file in Excel it displays incorrect characters. On inspection of the file I see that the encoding is Windows 1252 or some such. If I manually replace this with UTF-8, my file opens fine. Ok, so far so good, I can correct the thing manually.
Now the trick is that this file is generated automatically, that I need to process it automatically (no human interaction) with limited tools on my desktop (no perl or other scripting language).
Is there any simple way to open this XL file in VBA with the correct encoding (and ignore the encoding specified in the file)?
Note, Workbook.ReloadAs does not function for me, it bails out on error (and requires manual action as the file is already open).
Or is the only way to correct the file to go through some hoops? Either: text in, check line for encoding string, replace if required, write each line to new file...; or export to csv, then import from csv again with specific encoding, save as xls?
Any hints appreciated.
EDIT:
ADODB did not work for me (XL says user defined type, not defined).
I solved my problem with a workaround:
name2 = Replace(name, ".xls", ".txt")
Set wb = Workbooks.Open(name, True, True) ' open read-only
Set ws = wb.Worksheets(1)
ws.SaveAs FileName:=name2, FileFormat:=xlCSV
wb.Close False ' close workbook without saving changes
Set wb = Nothing ' free memory
Workbooks.OpenText FileName:=name2, _
Origin:=65001, _
DataType:=xlDelimited, _
Comma:=True
Well I think you can do it from another workbook. Add a reference to AcitiveX Data Objects, then add this sub:
Sub Encode(ByVal sPath$, Optional SetChar$ = "UTF-8")
Dim stream As ADODB.stream
Set stream = New ADODB.stream
With stream
.Open
.LoadFromFile sPath ' Loads a File
.Charset = SetChar ' sets stream encoding (UTF-8)
.SaveToFile sPath, adSaveCreateOverWrite
.Close
End With
Set stream = Nothing
Workbooks.Open sPath
End Sub
Then call this sub with the path to file with the off encoding.
I have download and install KaZip2.0 on C++Builder2009 (with little minor changes => only set type String to AnsiString). I have write:
KAZip1->FileName = "test.zip";
KAZip1->CreateZip("test.zip");
KAZip1->Active = true;
KAZip1->Entries->AddFile("pack\\text.txt","xxx.txt");
KAZip1->Active = false;
KAZip1->Close();
now he create a test.zip with included xxx.txt (59byte original, 21byte packed). I open the archiv in WinRAR successful and want open the xxx.txt, but WinRAR says file is corrupt. :(
What is wrong? Can somebody help me?
Extract not working, because file is corrupt?
KAZip1->FileName = "test.zip";
KAZip1->Active = true;
KAZip1->Entries->ExtractToFile("xxx.txt","zzz.txt");
KAZip1->Active = false;
KAZip1->Close();
with little minor changes => only set
type String to AnsiString
Use RawByteString instead of AnsiString.
I have no idea how KaZip2.0 is implemented, but in general, to make a Delphi/C++ library that was designed without Unicode support in mind working properly you need to do two things:
Replace all Char with AnsiChar and all string to AnsiString
Replace all Win API calls with their Ansi variant, i.e. replace AWin32Function with AWin32FunctionA.
In Delphi < 2009, Char = AnsiChar, String = AnsiString, AWin32Function = AWin32FunctionA, but in Delphi >= 2009, by default, Char = WideChar, String = UnicodeString, AWin32Function = AWin32FunctionW.
WinRAR could be simply failing to recognize the header. Try opening it in Windows or some other zip programs.
with little minor changes => only set
type String to AnsiString
That's doesn't work always right, it may compile but it doesn't mean it will work right in D2009 or CB2009, you need to show the places that you convert Strings to AnsiStrings, specially the code deal with : Buffers, Streams and I/O.
It's not surprising that your code is wrong; KaZip has no documentation.
Proper code is:
//Create a new empty zip file
KAZip1->CreateZip("test.zip");
//Open our newly created zip file so we can add files to it
KAZIP1->Open("test.zip");
//Compress text.txt into xxx.txt
KAZip1->Entries->AddFile("pack\\text.txt","xxx.txt");
//Close the file stream
KAZip1->Close();