reading non printable characters from a file in smalltalk - gnu-smalltalk

I have a function which outputs the integer 128 as a character into a file.
When I reopen this file to read that character with the next function, it reads the sequence of characters in the form 60 49 54 114 48 48 56 48 62. When I output 127 into the file and then read it again, next correctly returns 127, so what's wrong with the characters > 128?. How can I read the actual integer representation of the character correctly?
Code for outputting an integer into a file
|inputfile args name|
args := Smalltalk arguments.
name := args at: 1.
inputfile := FileStream open: name mode: FileStream write.
inputfile << 128 asCharacter.
inputfile close.
Code for reading a file
|inputfile args name|
args := Smalltalk arguments.
name := args at: 1.
inputfile := FileStream open: name mode: FileStream read.
[inputfile atEnd ~= true] whileTrue:[stdout << inputfile next asInteger.].
inputfile close.

Related

Is it Possible to convert a set of numbers to their ASCII Char values?

I can convert a string into the Ascii values with spaces in between.
Example: "I like carrots"
To: 73 32 108 105...
But I'm trying to figure out a way to take those NUMBERS and convert them BACK into their ascii Chars.
I'm using the latest version of Delphi (embarcadero) and am new to coding so pls help me out :)
You can use the Ord function in every character.
The Chr function returns the character for specified ASCII value.
Example
var
s: string;
c: char;
s:='I like carrots';
for c in s do
begin
Writeln(ord(c));
end;
Here is the code:
var
S1 : String;
S2 : String;
A : TStringDynArray; // Add System.Types in uses
N : String;
C : Char;
begin
// 'I like carrots' in ascii code separated by a space
S1 := '73 32 108 105 107 101 32 99 97 114 114 111 116 115';
A := SplitString(S1, ' '); // Add System.StrUtils in uses
S2 := '';
for N in A do
S2 := S2 + Char(StrToInt(N));
ShowMessage(S2);
end;
I don't know the complete problem but you could add code to check of valid input string.
I don't know much about delphi, but take a look at this link: http://www.delphibasics.co.uk/Article.asp?Name=Text
In the Assigning to and from character variables section it demonstrates a function like so:
fromNum := Chr(65); // Assign using a function.
If that doesn't work, you could consider building a large mapping of int->char. The beginnings of which is in that above website also near the top.

OmniXML on iOS: Invalid Unicode

I recently switched to use the OmniXML included with Delphi XE7, to allow targeting iOS. The XML data comes from a cloud service and includes nodes with base64 encoded binary data.
Now I get this exeception "Invalid Unicode Character value for this platform" when calling XMLDocument.LoadFromStream, and it seems to be this base64 linebreak sequence that fails:
The nodes with base64 data looks similar to this:
<data>TVRMUQAAAAIAAAAAFFo3FAAUAAEA8AADsAAAAEAAAABAAHAAwABgAAAAAAAAAAAQEBAAAAAAAA
AAMQAAABNUgAAP/f/AAMABAoAAAAEAAAAAEVNVExNAAAAAQAAAAAUWjcUABQAAQD/wAA
AAA=</data>
I traced it down to these lines in XML.Internal.OmniXML:
psCharHexRef:
if CharIs_WhiteSpace(ReadChar) then
raise EXMLException.CreateParseError(INVALID_CHARACTER_ERR, MSG_E_UNEXPECTED_WHITESPACE, [])
else
begin
case ReadChar of
'0'..'9': CharRef := LongWord(CharRef shl 4) + LongWord(Ord(ReadChar) - 48);
'A'..'F': CharRef := LongWord(CharRef shl 4) + LongWord(Ord(ReadChar) - 65 + 10);
'a'..'f': CharRef := LongWord(CharRef shl 4) + LongWord(Ord(ReadChar) - 97 + 10);
';':
if CharIs_Char(Char(CharRef)) then
begin
Result := Char(CharRef);
Exit;
end
else
raise EXMLException.CreateParseError(INVALID_CHARACTER_ERR, MSG_E_INVALID_UNICODE, []);
It is the exception in the last line that is raised because CharIs_Char(#13) is false (where #13 is the value of CharRef read from 
)
How do I solve this?
This is clearly a bug in OmniXML. It looks like the developers were trying to implement XML1.0 which states :
...XML processors MUST accept any character in the range specified for Char.
Character Range
[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
/* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */
The implementation of CharIs_Char, however looks like :
function CharIs_Char(const ch: Char): Boolean;
begin
// [2] Char - any Unicode character, excluding the surrogate blocks, FFFE, and FFFF
Result := not Ch.IsControl;
end;
This is excluding all control characters, which include #x9(TAB), #xA(LF) and #xD(CR). In fact, since XML strips (or optionally replaces with LF) carriage return literals during parsing, the only way to include an actual carriage return is using a character reference in an entity value literal (section 2.3 of the specification).
This seems like a showstopper and should be submitted as a QC report.

Upgrade for file of records and backward compatability

I have such file:
file of record
Str: string[250];
RecType: Cardinal;
end;
but after some time using of this file my customer found, that Str never be bigger than 100 chars and also he need additional fields.
In new version we have such file:
file of packed record
Str: string[200];
Reserved: array[1..47] of Byte;
NewFiled: Cardinal;
RecType: Cardinal;
end;
This record have the same size, in previous record between Str and RecType was one unused byte when aligned to 8 bytes.
Question: what happened, when this new file will be readed from old code? He need backward compatability.
Old code reading sample:
var
FS: TFileStream;
Rec: record
Str: string[250];
RecType: Cardinal;
end;
...
// reading record by record from file:
FS.Read(Rec, SizeOf(Rec));
The old school pascal string use the first byte of the string (index 0) to store the length of the string.
Let's look at the memory of this record:
byte 0 1 2 3 4 5 6 7 8 9 10 11 12 13 ........ 243..246 247..250
value 10 65 66 67 68 69 70 71 72 73 74 0 200 130 NewField RecType
From byte 11 to 242, the memory can contain garbage, it is simply ignored by the program (never shown) as this takes the value 10 at the byte 0 as the length of the string, so the string becomes 'ABCDEFGHIJ'
This ensures the old program reading a file created with the most recent version will never see garbage at the end of the strings, since the view of that strings will be limited to the actual size of the string and that memory positions are just ignored.
You have to double check if the old program does not change the values stored in case it writes the records back to the file. I think it is also safe, but I'm just not sure and have no Delphi at hand to test.

CharInSet accepting Unicode NULL character

I'm reading some data from memory, and this area of memory is in Unicode. So to make one ansi string I need something like this:
while CharInSet(Chr(Ord(Buff[aux])), ['0'..'9', #0]) do
begin
Target:= Target + Chr(Ord(Buff[aux]));
inc(aux);
end;
Where Buff is array of Bytes and Target is string. I just want keep getting Buff and adding in Target while it's 0..9, but when it finds NULL memory char (00), it just stops. How can I keep adding data in Target until first letter or non-numeric character?? The #0 has no effect.
I would not even bother with CharInSet() since you are dealing with bytes and not characters:
var
b: Byte;
while aux < Length(Buff) do
begin
b := Buff[aux];
if ((b >= Ord('0')) and (b <= Ord('9'))) or (b = 0) then
begin
Target := Target + Char(Buff[aux]);
Inc(aux);
end else
Break;
end;
If your data is Unicode, then I am assuming that the encoding is UTF-16. In which case you cannot process it byte by byte. A character unit is 2 bytes wide. Put the data into a Delphi string first, and then parse it:
var
str: string;
....
SetString(str, PChar(Buff), Length(Buff) div SizeOf(Char));
Do it this way and your loop can look like this:
for i := 1 to Length(str) do
if not CharInSet(str[i], ['0'..'9']) then
begin
SetLength(str, i-1);
break;
end;
I believe that your confusion was caused by processing byte by byte. With UTF-16 encoded text, ASCII characters are encoded as a pair of bytes, the most significant of which is zero. I suspect that explains what you were trying to achieve with your CharInSet call.
If you want to cater for other digit characters then you can use the Character unit and test with TCharacter.IsDigit().

TStringList problem with values at index

So I have several summary files that I want to read and get the values from.
I am doing the following:
OutputSummary := TStringList.Create;
for idx := 0 to 82 do
OutputSummary.Insert(idx, '');
to initialize the values I'm using
then, I have a loop:
for idx := 0 to SummaryFiles.Count - 1 do
begin
AssignFile(finp, SummaryFiles[idx]);
ReSet(finp);
for ndx := 0 to 5 do
ReadLn(finp, buff);
for ndx := 0 to 82 do
begin
ReadLn(finp, buff);
temp := GetToken(buff, ' ');
buff := GetRemains(buff, '|');
temp := GetToken(buff, '|');
valuestring := OutputSummary[ndx] + delimiter + temp;
OutputSummary.Insert(ndx, valuestring);
end;
CloseFile(finp);
end;
The first 0 to 5 loop skips the lines I don't want to read, and the 0 to 82 reads lines that look like
1. Initial Wait List|1770
So I was debugging the program to see how it works with just 2 SummaryFiles.
The first time through, it works perfectly. The line is read correctly, I get the value and when I insert valuestring, it looks like ",1770" (for example), and I can also highlight OutputSummary[ndx] after the insert command and see that the value was inserted correctly.
Then I open the second file, which also works fine until the line
valuestring := OutputSummary[ndx] + delimiter + temp;
the first time, OutputSummary[0] is correct and the correct line is added.
However, OutputSummary[1] through OutputSummary[82] is the same as OutputSummary[0]! This makes no sense since when I was first adding those values, I could see that OutputSummary[1] through 82 were unique and correct.
Can anyone see a problem? Is it a debugger error? Am I just missing something obvious that I don't see?
thanks
It looks to me like you're trying to create a table of some sort, with one column per input file and one row per line in the file, with the columns separated by the delimiter. If so, calling .Insert on the string list isn't going to quite work right, since you'll end up inserting 83 * SummaryFiles.Count rows.
Instead of the Insert call, you need something like this:
if OutputSummary.count > ndx then
OutputSummary[ndx] := valuestring
else OutputSummary.Add(valuestring);
See if that helps.
Also, you might want to consider replacing the "magic number" 82 with a meaningful constant, like const LINES_TO_READ = 82. That makes it easier to read the code and understand what it's supposed to be doing.

Resources