Reading files per character with Lua (Löve API) - lua

I am trying to read files per character in Lua with the Löve API but I just can't figure out how. There must be some way to do this right? In other posts I found something about reading files per line but I really need to read them per char. Could someone please tell me how to do this?
Thanks in advance,
Leveljaap

If f is a file handle, then f:read(1) returns the next byte in the file or nil at the end of the file.
Note that the next byte may not be the next character if the file contains UTF-8 Unicode for instance.

Related

Read/Parse Binary files with Powershell

I'm trying to parse a binary file, and I need some help on where to go. I've looking online for "parsing binary files", "reading binary files", "reading text inside binaries", etc. and I haven't had any luck.
For example, how would I read this text out of this binary file? Any help would be MUCH appreciated. I am using powershell.
It seems that you have a binary file with text on a fixed or otherwise deducible position. Get-Content might help you but... It'll try to parse the entire file to an array of strings and thus creating an array of "garbage". Also, you wouldn't know from what file position a particular "rope of characters" was.
You can try .NET classes File to read and Encoding to decode. It's just a line for each call:
# Read the entire file to an array of bytes.
$bytes = [System.IO.File]::ReadAllBytes("path_to_the_file")
# Decode first 12 bytes to a text assuming ASCII encoding.
$text = [System.Text.Encoding]::ASCII.GetString($bytes, 0, 12)
In your real case you'd probably go through the array of bytes in a loop finding the start and end of a particular string sequence and using those indices to specify the range of bytes you want to extract the text from by the GetString.
The .NET methods I mentioned are available in .NET Framework 2.0 or higher. If you installed PowerShell 2.0 you already have it.
If you're just looking for strings, check out the strings.exe utility from SysInternals.
You can read in the file via Get-Content -Encoding byte . I'm not sure how to parse it though.

Strip Some ASCII Codes from FIle Efficiently?

I have an on-disk file of 100mb (can be up to 300mb). There are nulls and some other control characters that should not be in there. At first I read the string into memory and then re-read it Char by Char and then removed the offending Chars and put the clean stuff into a StringBuilder and then did a ToString on that.
That uses too much memory of course. I need to figure out how to strip out bad ASCII values on disk. Maybe (.NET 4) Memory Mapped File Stream is the right thing (I checked this out from Memory Mapped File to Read End of File? a while ago)?
All ideas appreciated. Thanks.
If you need to shrink the file to remove bad characters then simply read the file in a character or block at a time and write it out to a new file skipping bad characters.
This also gives you an undo!
If you can replace bad characters in place so that the length of the file doesn't change then map the file and scan over the memory replacing each bad character with eg space (ascii 32). This is simplest and probably faster - but either way you are going to be dominated by the raw disk i/o

Bad characters from CSV into database

I am trying to figure out why I keep getting bad characters when I import information into my database from a CSV file.
Setup:
Database is UTF-8 encoding
HTML Page = UTF-8 Encoding (Meta Tag)
What I'm receiving when the file is imported is.
But in the CSV file everything looks clean, and the actual number is +1 (250) 862-8350
So I don't know what the issue is, my hunch is something to do with a form of trimming but I haven't been able to figure out what it is... any light would be appreciated!
Well I found out my answer, and it's somewhat embarasing. when my phone number gets put into the database I run it through my cleaner, and then encode the data... But I didn't notice that my database column was set to a small character count... and my encoding was longer that what would be inserted into the database... So in short, I made a my column 64 vs 32 and it solved the problem.
Thank you for your time in trying to help me though!

Erlang, reading a file with character offset

I have code to find a specific occurance of text in a file and give me an offset so I know where this occurance end. Now I want to read the file from that offset to the end of the file. The file contains binary data as well as text. How do I do this in Erlang?
Use pread. (See Erlang documentation on the file module). You have to take care of any character encoding yourself as the function deals with only bytes.

What's the format of the OpenOffice dictionaries?

Does anyone know what the format of the OpenOffice dictionary files are? As far as I can see there is one word per line, and some flags that presumably tells me something about the word.
Here's a couple of lines from the english dictionary as an example:
absoluteness/S
absorbency/SM
abstract/ShTVDPiGY
absurdness/S
And from the Norwegian dictionary, which is what I'll use:
flatorm/AEG
flatpresse/W
flatseng/ACEG
flatside/ACDEFGHJ
flatskjerm/A
What does for instance "/AEG" and "/S" mean? I assume each letter/flag has a certain meaning, so that tha A in "/AEG" means the same as the A in "/ACDEFGHJ".
I have googled all over the place, but I can't find any information.
OO uses the hunspell engine for spell-checking. The stuff after the "/" is linked to data in the corresponding affix file.

Resources