Encoding a JMS TextMessage - character-encoding

I'm receiving messages from a JMS MQ queue which are supposedly utf-8 encoded. However on reading the out using msgText = ((TextMessage)msg).getText();
I get question marks where non standard characters were present. It seems possible to specify the encoding when using a bytemessage, but I cant find a way to specify encoding while reading out the TextMessage. Is there a way to solve this, or should I press for bytemessages?

We tried adding Dfile.encoding="UTF-8" to Websphere's jvm and we added
source = new StreamSource(new ByteArrayInputStream(
((TextMessage) msg).getText().getBytes("UTF-8")));
In our MessageListener. This worked for us, so then we took out the Dfile.encoding bit away and it still works for us.
Due to preferred minimum configuration for Websphere we decided to leave it this way, also taking into account that we may easier switch the UTF-8 string by a setting from file or database.

If the text is not decoded correctly, then probably the client is not sending the message with the utf-8 codec; this should work:
byte[] by = ((TextMessage) msg).getText().getBytes("ISO-8859-1");
String text = new String(by,"UTF-8");

Related

Unrecognised header byte error when try to decode an Avro message in Spring Cloud Stream

I am trying to write a test case for my Spring Cloud Stream application. I am using Confluent Schema Registry with Avro, so I need to decode the message after polling from the channel. Here is my code:
processor.input()
.send(MessageBuilder.withPayload(InputData).build());
Message<?> message = messageCollector.forChannel(processor.output()).poll();
BinaryMessageDecoder<OutputData> decoder = OutputData.getDecoder();
OutputData outputObject = decoder.decode((byte[]) message.getPayload());
For some reason this code throws
org.apache.avro.message.BadHeaderException: Unrecognized header bytes: 0x00 0x08
I am not sure if this is some sort of bug I am facing or I am not following a proper way to decode the received avro message. I suspect I need to set header with something, but I am not quite sure how and with what exactly. I would appreciate it if someone could help me with this matter.
P.S: I am using spring-cloud-stream-test-support for the purpose of this test.
The data won't be avro-encoded when using the test binder.
The test binder is very limited.
To properly test end-to-end with avro, you should remove the test binder and use the real kafka binder with an embedded kafka broker.
One of the sample apps shows how to do it.
It turns out that the issue was related to how I was trying to decode the Avro message. By using the official Avro libraries, the following code worked for me:
Decoder decoder = DecoderFactory.get().binaryDecoder((byte[]) message.getPayload(), null);
DatumReader<OutputData> reader = new SpecificDatumReader<>(OutputData.getClassSchema());
RawDataCapsule rawDataCapsule = reader.read(null , decoder);

DocuSign Connect update XML desserialization error

I have been using DocuSign SOAP and REST based API calls to create envelope and am also using their Connect feature to update the recipient and envelope statuses for my clients.
I am getting a strange error parsing DocuSign Connect update for one client.
The error says "There is an error in XML document (1, 16174)".
Here is my code...
Dim sr As New StreamReader(Request.InputStream)
Dim reader As XmlReader = New XmlTextReader(New StringReader(xml))
Dim serializer As New XmlSerializer(GetType(DocuSignEnvelopeInformation), "http://www.docusign.net/API/3.0")
If Not serializer Is Nothing Then
envelopeInfo = TryCast(serializer.Deserialize(reader), DocuSignEnvelopeInformation)
Dim envid As String = envelopeInfo.EnvelopeStatus.EnvelopeID.ToString
I have tried bunch of things such as removing the XML definition from the XML document but did not work. The strange thing is that the same code works for all of my other clients. This is the only client that is having issues. They have added closed 65 tags in the document to be signed but I don't think that the tags are causing issues on their end since I also tried removing them.
Please advise.
Minal
I have run into this issue before when there are unsupported characters in the tab values or in the PDF byte stream itself when it is decoded. I suspect that copying and pasting values into tabs from external programs like Word introduce some invisible weird characters like 
 - carriage returns and the like. You should validate your XML in its entirety.

What should the JCA deployment descriptor (ra.xml) character encoding be?

Looking through JCA 1.7 specification I could only find in one of their examples on the Resource Adapter Deployment Descriptor the following (Chapter 13: Message Inflow P 13-50):
This example is showing the usage of UTF-8 encoding, however there is nothing saying if this was an optional selection for the example illustration or a must restriction on the file character encoding.
I'm asking this because I'm writing a Java program to read one of these files and FindBugs™ is giving me this message:
DM_DEFAULT_ENCODING: Reliance on default encoding
Found a call to a method which will perform a byte to String (or
String to byte) conversion, and will assume that the default platform
encoding is suitable. This will cause the application behaviour to
vary between platforms. Use an alternative API and specify a charset
name or Charset object explicitly.
Line 4 in this Java code snippet is where character encoding will be specified:
01. byte[] contents = new byte[1024];
02. int bytesRead = 0;
03. while ((bytesRead = bin.read(contents)) != -1)
04. result.append(new String(contents, 0, bytesRead));
So, Is it possible to specify the expected encoding of this file in this case or not?
From what I saw, Most people use the UTF-8 encoding for their ra.xml. However there is no restriction on using other encoding. So if you base your parsing to expect UTF-8 only, the result might not be as expected.
So you either need to count for this in your code when you are reading this as a normal text, or read it as an xml file and save yourself the headache. I don't think the difference in performance will be an issue because the ra.xml files do not usually grow to gigabytes. At least the ones I've seen so far are on an average of few megabytes.
For the Findbug issue, you just need to specify the encoding as a UTF-8. Otherwise you will be using the default of the JVM which is determined during virtual-machine startup and typically depends upon the locale and charset of the underlying operating system. Although using the default is not a recommended behavior here, if that is what you want then just specify the usage of default encoding. This would get rid of the Findbug issue.
So your code would look like something like this:
01. byte[] contents = new byte[1024];
02. int bytesRead = 0;
03. while ((bytesRead = bin.read(contents)) != -1)
04. result.append(new String(contents, 0, bytesRead, Charset.defaultCharset()));
FindBugs just warns you that you're relying on default system encoding, so it's possible that if your application will be launched by another user in another country you might get unexpected results. It's better to explicitly specify which encoding you want to use.
In your case the actual encoding should be extracted from XML file. There are several ways to get it. One method is to use XMLStreamReader as described in this answer.

Avoiding SSIS script task to convert utf-8 to unicode for AS400 data to SQL Server

After many tries I have concluded that the optimal way to transfer with SSIS data from AS400 (non-unicode) to SQL Server is:
Use native transfer utility to dump data to tsv (tab delimited)
Convert files from utf-8 to unicode
Use bulk insert to put them into SQL Server
In #2 step I have found a ready made code that does this:
string from = #"\\appsrv02\c$\bg_f0101.tsv";
string to = #"\\appsrv02\c$\bg_f0101.txt";
using (StreamReader reader = new StreamReader(from, Encoding.UTF8, false, 1000000))
using (StreamWriter writer = new StreamWriter(to, false, Encoding.Unicode, 1000000))
{
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
if (line.Length > 0)
writer.WriteLine(line);
}
}
I need to fully understand what is happening here with the encoding and why this is necessary.
How can I replace this script task with a more elegant solution?
I don't have much insight into exactly why you need the utf-8 conversion task, except to say that SQL server - I believe - uses UCS-2 as its native storage format, and this is similiar to UTF-16 which is what your task converts the file to. I'm surprised SSIS can't work with a UTF-8 input source though.
My main point is to answer the "How could I replace this script task with a more elegant solution?":
I have had a lot of success using HiT OLEDB/400 Server. It allows you to set up your AS/400 / iSeries / System i / whatever IBM are calling it this week as a linked server in SQL server, and you can then access the 400's data directly from the server its linked to using the standard 4 part SQL syntax, e.g. SELECT * FROM my400.my400.myLib.myFile.
Or even better, it's much more efficient as a passthrough query using EXEC...AT.
Using this you would not need SSIS at all, you'd just need a simple stored proc with that does an insert into your destination table direct from the 400 data.

Save and Load UTF-8 From Oracle 10g with iBatis

I'm making a web app that needs to load and save UTF-8 (Korean, specifically) characters from a DB. I've been given an account on the Oracle 10g server, but it saves VARCHAR2 type columns as ASCII7, with each UTF-8 character taking 2 VARCHAR2 slots.
I assumed that since iBatis is writing in the same way that it is reading, if I treat everything from input to output as UTF-8 I will have no problems, but any Korean characters I input come out garbled.
Is there a way to do this properly without messing up the (someone else's) DB?
Further information:
I've previously been able to load Korean strings using:
ResultSet rs = ps.executeQuery();
String koreanString = new String(rs.getBytes("colname"), "euc-kr");
And write Korean strings to db using:
PreparedStatement ps = conn.prepareStatement(sql);
ps.setString(1, new String(koreanString.getBytes("euc-kr"), "ISO-8859-1"));
Attempts to change the JDBC connection url result in this message:
Description
Listener refused the connection with the following error:
ORA-12505, TNS:listener does not currently know of SID given in connect descriptor
The Connection descriptor used by the client was:
[ip]:myTablespace?useUnicode=true&characterEncoding=UTF-8
error dump
javax.servlet.ServletException: Listener refused the connection with the following error:
ORA-12505, TNS:listener does not currently know of SID given in connect descriptor
The Connection descriptor used by the client was:
[ip]:myTablespace?useUnicode=true&characterEncoding=UTF-8
at jeus.servlet.jsp2.runtime.PageContextImpl.doHandlePageException(PageContextImpl.java:859)
at jeus.servlet.jsp2.runtime.PageContextImpl.handlePageException(PageContextImpl.java:789)
at jeus_jspwork._jsp._500_managerAdmin_5fjsp._jspService(_500_managerAdmin_5fjsp.java:452)
at jeus.servlet.jsp2.runtime.HttpJspBase.service(HttpJspBase.java:95)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:856)
at jeus.servlet.jsp.JspServletWrapper.executeServlet(JspServletWrapper.java:147)
at jeus.servlet.servlets.JspServlet.execute(JspServlet.java:365)
at jeus.servlet.engine.HttpRequestProcessor.run(HttpRequestProcessor.java:284)
root cause
java.sql.SQLException: Listener refused the connection with the following error:
ORA-12505, TNS:listener does not currently know of SID given in connect descriptor
The Connection descriptor used by the client was:
[ip]:myTablespace?useUnicode=true&characterEncoding=UTF-8
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:112)
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:261)
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:387)
at oracle.jdbc.driver.PhysicalConnection.<init>(PhysicalConnection.java:441)
at oracle.jdbc.driver.T4CConnection.<init>(T4CConnection.java:165)
at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:35)
at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:801)
at java.sql.DriverManager.getConnection(DriverManager.java:525)
at java.sql.DriverManager.getConnection(DriverManager.java:171)
at com.ibatis.common.jdbc.SimpleDataSource.popConnection(SimpleDataSource.java:580)
at com.ibatis.common.jdbc.SimpleDataSource.getConnection(SimpleDataSource.java:222)
at com.ibatis.sqlmap.engine.transaction.jdbc.JdbcTransaction.init(JdbcTransaction.java:48)
at com.ibatis.sqlmap.engine.transaction.jdbc.JdbcTransaction.getConnection(JdbcTransaction.java:89)
at com.ibatis.sqlmap.engine.mapping.statement.MappedStatement.executeQueryForObject(MappedStatement.java:120)
at com.ibatis.sqlmap.engine.impl.SqlMapExecutorDelegate.queryForObject(SqlMapExecutorDelegate.java:518)
at com.ibatis.sqlmap.engine.impl.SqlMapExecutorDelegate.queryForObject(SqlMapExecutorDelegate.java:493)
at com.ibatis.sqlmap.engine.impl.SqlMapSessionImpl.queryForObject(SqlMapSessionImpl.java:106)
at com.ibatis.sqlmap.engine.impl.SqlMapClientImpl.queryForObject(SqlMapClientImpl.java:82)
at [].admRole.getCount(admRole.java:44)
at jeus_jspwork._jsp._500_managerAdmin_5fjsp._jspService(_500_managerAdmin_5fjsp.java:145)
at jeus.servlet.jsp2.runtime.HttpJspBase.service(HttpJspBase.java:95)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:856)
at jeus.servlet.jsp.JspServletWrapper.executeServlet(JspServletWrapper.java:147)
at jeus.servlet.servlets.JspServlet.execute(JspServlet.java:365)
at jeus.servlet.engine.HttpRequestProcessor.run(HttpRequestProcessor.java:284)
As I stated in the question, strings are stored and retrieved correctly if they are re-encoded as EUC-KR before being turned into ISO-8859-1 (to save, or vice versa to retrieve).
I modified the two following classes:
com.ibatis.sqlmap.engine.mapping.parameter.ParameterMap
com.ibatis.sqlmap.engine.mapping.result.ResultMap
In both cases, I took the Object[] array (parameters and columnValues), casted to String, and applied the encoding transformations.
I am not using oracle for quite a while, but I have some confidence that this is reason of your "listener does not currently know of SID given" error: Can I force JDBC Driver use UTF-8 Charset to encode?
I am a Chinese developer so the character encoding problem is pretty much the same(we are mostly using GBK character set here). As far as I can remember, "but it saves VARCHAR2 type columns as ASCII7" means that your oracle instance is a non-unicode installation?
The force use of string.getBytes(charset) above JDBC layer is really really bad in terms of maintenance and data interpretability(the string data is displayed as a mess to DBA; DBA can not use SQL to perform any string comparison on this column, etc). So my advice is try to contact your DBA and get the database working with unicode first, since Oracle is very capable of handling unicode data.

Resources