Integer-String Parser - parsing

I am parsing an XML document, and when I try to save the String (data) that I am reading as an Integer, I found an exception.
Integer.parseInt(data)
I also tried changing the atribute type to Byte and trying on another way but it did not work.
{Byte.parseByte(data)}
More precisely, I am doing,
{message.setType(Integer.parseInt(data));}
I have also tried with valueOf and intValue and it did not work either.
messageType was both times (int or byte).
While I am parsing the XML, I print out data and it exactly has the value and lenght that I am expecting. (value:9, length:1)

Related

How to correctly parse a sequence of arbitrary bytes

Edit: changed title and question to make them more general and easier to find when looking for this specific issue
I'm parsing a sequence of arbitrary bytes (u8) from a file, and I've come to a point where I had to use std::str::from_utf8_unchecked (which is unsafe) instead of the usual std::str::from_utf8, because it seemingly was the only way to make the program work.
fn parse(seq: &[u8]) {
. . .
let data: Vec<u8> = unsafe {
str::from_utf8_unchecked(&value[8..data_end_index])
.as_bytes()
.to_vec()
};
. . .
}
This, however, is probably leading to some inconsistencies with the actual use of the program, because with non trivial use I get an io::Error::InvalidData with the message "stream did not contain valid UTF-8".
In the end the issue was caused by the fact that I was converting the sequence of arbitrary bytes into a string which, in Rust, must contain only valid UTF-8 characters. The conversion was done because I thought I could call to_vec only after as_bytes, without realizing that a &[u8] is in fact a slice of bytes already.
I was converting the byte slice into a vec, but only after performing a useless and harmful conversion into a string, which doesn't make sense because not all bytes are valid ASCII characters that can be parsed as UTF-8 characters to create a Rust string.
The correct code is the following:
let data: Vec<u8> = value[8..data_end_index].to_vec();

Conversion of sequence of bytes to ASCII string in lua

I am trying to write custom dissector for Wireshark, which will change byte/hex output to ASCII string.
I was able to write the body of this dissector and it works. My only problem is conversion of this data to ASCII string.
Wireshark declares this data to be sequence of bytes.
To Lua the data type is userdata (tested using type(data)).
If I simply convert it to string using tostring(data) my dissector returns 24:50:48, which is the exact hex representation of bytes in an array.
Is there any way to directly convert this byte sequence to ascii, or can you help me convert this colon separated string to ascii string? I am totally new to Lua. I've tried something like split(tostring(data),":") but this returns Lua Error: attempt to call global 'split' (a nil value)
Using Jakuje's answer I was able to create something like this:
function isempty(s)
return s == nil or s == ''
end
data = "24:50:48:49:4A"
s = ""
for i in string.gmatch(data, "[^:]*") do
if not isempty( i ) then
print(string.char(tonumber(i,16)))
s = s .. string.char(tonumber(i,16))
end
end
print( s )
I am not sure if this is effective, but at least it works ;)
There is no such function as split in Lua (consulting reference manual is a good start). You should use probably string.gmatch function as described on wiki:
data = "24:50:48"
for i in string.gmatch(data, "[^:]*") do
print(i)
end
(live example)
Further you are searching for string.char function to convert bytes to ascii char.
You need to mark range of bytes in the buffer that you're interested in and convert it to the type you want:
data:range(offset, length):string()
-- or just call it, which works the same thanks to __call metamethod
data(offset, length):string()
See TvbRange description in https://wiki.wireshark.org/LuaAPI/Tvb for full list of available methods of converting buffer range data to different types.

Using sqlite3_column_text() to fetch interger value

I have a table with TEXT, INTEGER, REAL and other data types.
I wrote a generic sql function to read results for all queries using sqlite3_column_text() like so:
char *dataAsChar = (char *) sqlite3_column_text(compiledStatement, ii);
Although I should be using sqlite3_column_int etc. to read the numeric values, the above code seems to work for me. I get the number values as string which I later convert to int using [*numberAsString* intValue].
Since I am using a generic function to read all my db values, this is very convenient for me. But is there something that can go wrong with my code?
I could use sqlite3_column_type for each column to determine the type and use appropriate function. Am I correct in assuming that sqlite3_column_text basically returns the column value in TEXT format and does not necessarily need the value itself for be TEXT?
The only situation where I can see this implementation failing is with BLOB data type.
The documentation says:
These routines attempt to convert the value where appropriate. […]
The following table details the conversions that are applied:
Internal Type Requested Type Conversion
NULL TEXT Result is a NULL pointer
INTEGER TEXT ASCII rendering of the integer
FLOAT TEXT ASCII rendering of the float
BLOB TEXT Add a zero terminator if needed

NSSet full of NSStrings. When I print the set to the console, the results are unexpected

Here's what happens:
Internal database stuff: one class has a string property on it, that stores a phone number. This number is set using the code
CFBridgingRelease(ABMultiValueCopyValueAtIndex(ABRecordCopyValue(record, kABPersonPhoneProperty), 0));
My function: finds all objects of this type, and stores phone numbers of each object in an NSMutableSet.
Debug: I print the description of the set to the console.
Results:
Some of the set's objects look as expected (the majority actually): "+64 27 0124 975"
Some are missing quotation marks: 027 7824 565
Some have weird unicode symbols: "021\U00a0026\U00a017788"
My question:
Why the difference - what does it mean, and do I need to fix anything?
NSLog with %# – as I assume you are using – has some intelligence in how it presents NSStrings as it calls the description method. If the string has anything other than alphanumerics, such as the '+' or '\' above, it will use quotes. The string with unicode characters simply has its characters encoded as shown, and they are automatically converted into this lossless format. You should be able to convert it to something prettier for the console if you really need to with something like this:
NSLog(#"%#", [NSString stringWithCString:[myString.description cStringUsingEncoding:NSASCIIStringEncoding] encoding:NSNonLossyASCIIStringEncoding]);

Handing strings with binary data in it using java.nio

I am having issues parsing text files that have illegal characters(binary markers) in them. An answer would be something as follows:
test.csv
^000000^id1,text1,text2,text3
Here the ^000000^ is a textual representation of illegal characters in the source file.
I was thinking about using the java.nio to validate the line before I process it. So, I was thinking of introducing a Validator trait as follows:
import java.nio.charset._
trait Validator{
private def encoder = Charset.forName("UTF-8").newEncoder
def isValidEncoding(line:String):Boolean = {
encoder.canEncode(line)
}
}
Do you guys think this is the correct approach to handle the situation?
Thanks
It is too late when you already have a String, UTF-8 can always encode any string*. You need to go to the point where you are decoding the file initially.
ISO-8859-1 is an encoding with interesting properties:
Literally any byte sequence is valid ISO-8859-1
The code point of each decoded character is exactly the same as the value of the byte it was decoded from
So you could decode the file as ISO-8859-1 and just strip non-English characters:
//Pseudo code
str = file.decode("ISO-8859-1");
str = str.replace( "[\u0000-\u0019\u007F-\u00FF]", "");
You can also iterate line-by-line, and ignore each line that contains a character in [\u0000-\u0019\u007F-\u00FF], if that's what you mean by validating a line before processing it.
It also occurred to me that the binary marker could be a BOM. You can use a hex editor to view the values.
*Except those with illegal surrogates which is probably not the case here.
Binary data is not a string. Don't try to hack around input sequences that would be illegal upon conversion to a String.
If your input is an arbitrary sequence of bytes (even if many of them conform to ASCII), don't even try to convert it to a String.

Resources