How to remove non-ascii char from MQ messages with ESQL - messagebroker

CONCLUSION:
For some reason the flow wouldn't let me convert the incoming message to a BLOB by changing the Message Domain property of the Input Node so I added a Reset Content Descriptor node before the Compute Node with the code from the accepted answer. On the line that parses the XML and creates the XMLNSC Child for the message I was getting a 'CHARACTER:Invalid wire format received' error so I took that line out and added another Reset Content Descriptor node after the Compute Node instead. Now it parses and replaces the Unicode characters with spaces. So now it doesn't crash.
Here is the code for the added Compute Node:
CREATE FUNCTION Main() RETURNS BOOLEAN
BEGIN
DECLARE NonPrintable BLOB X'0001020304050607080B0C0E0F101112131415161718191A1B1C1D1E1F7F808182838485868788898A8B8C8D8E8F909192939495969798999A9B9C9D9E9FA0A1A2A3A4A5A6A7A8A9AAABACADAEAFB0B1B2B3B4B5B6B7B8B9BABBBCBDBEBFC0C1C2C3C4C5C6C7C8C9CACBCCCDCECFD0D1D2D3D4D5D6D7D8D9DADBDCDDDEDFE0E1E2E3E4E5E6E7E8E9EAEBECEDEEEFF1F2F3F4F5F6F7F8F9FAFBFCFDFEFF';
DECLARE Printable BLOB X'20202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020';
DECLARE Fixed BLOB TRANSLATE(InputRoot.BLOB.BLOB, NonPrintable, Printable);
SET OutputRoot = InputRoot;
SET OutputRoot.BLOB.BLOB = Fixed;
RETURN TRUE;
END;
UPDATE:
The message is being parsed as XML using XMLNSC. Thought that would cause a problem, but it does not appear to be.
Now I'm using PHP. I've created a node to plug into the legacy flow. Here's the relevant code:
class fixIncompetence {
function evaluate ($output_assembly,$input_assembly) {
$output_assembly->MRM = $input_assembly->MRM;
$output_assembly->MQMD = $input_assembly->MQMD;
$tmp = htmlentities($input_assembly->MRM->VALUE_TO_FIX, ENT_HTML5|ENT_SUBSTITUTE,'UTF-8');
if (!empty($tmp)) {
$output_assembly->MRM->VALUE_TO_FIX = $tmp;
}
// Ensure there are no null MRM fields. MessageBroker is strict.
foreach ($output_assembly->MRM as $key => $val) {
if (empty($val)) {
$output_assembly->MRM->$key = '';
}
}
}
}
Right now I'm getting a vague error about read only messages, but before that it wasn't working either.
Original Question:
For some reason I am unable to impress upon the senders of our MQ
messages that smart quotes, endashes, emdashes, and such crash our XML
parser.
I managed to make a working solution with SQL queries, but it wasted
too many resources. Here's the last thing I tried, but it didn't work
either:
CREATE FUNCTION CLEAN(IN STR CHAR) RETURNS CHAR BEGIN
SET STR = REPLACE('–',STR,'–');
SET STR = REPLACE('—',STR,'—');
SET STR = REPLACE('·',STR,'·');
SET STR = REPLACE('“',STR,'“');
SET STR = REPLACE('”',STR,'”');
SET STR = REPLACE('‘',STR,'&lsqo;');
SET STR = REPLACE('’',STR,'’');
SET STR = REPLACE('•',STR,'•');
SET STR = REPLACE('°',STR,'°');
RETURN STR;
END;
As you can see I'm not very good at this. I have tried reading about
various ESQL string functions without much success.

So in ESQL you can use the TRANSLATE function.
The following is a snippet I use to clean up a BLOB containing non-ASCII low hex values so that it then be cast into a usable character string.
You should be able to modify it to change your undesired characters into something more benign. Basically each hex value in NonPrintable gets translated into its positional equivalent in Printable, in this case always a full-stop i.e. x'2E' in ASCII. You'll need to make your BLOB's long enough to cover the desired range of hex values.
DECLARE NonPrintable BLOB X'000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F202122232425262728292A2B2C2D2E2F303132333435363738393A3B3C3D3E3F';
DECLARE Printable BLOB X'2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E';
SET WorkBlob = TRANSLATE(WorkBlob, NonPrintable, Printable);
BTW if messages with invalid characters only come in every now and then I'd probably specify BLOB on the input node and then use something similar to the following to invoke the XMLNSC parser.
CREATE LASTCHILD OF OutputRoot DOMAIN 'XMLNSC'
PARSE(InputRoot.BLOB.BLOB CCSID InputRoot.Properties.CodedCharSetId ENCODING InputRoot.Properties.Encoding);
With the exception terminal wired up you can then correct the BLOB's of any messages containing parser breaking invalid characters before attempting to reparse.
Finally my best wishes as I've had a number of battles over the years with being forced to correct invalid message content in the "Integration Layer" after all that's what it's meant to do.

Related

How do I convert esp.flash_read() to string

I have the following code:
buff=esp.flash_read(esp.flash_user_start(),50)
print(buff)
I get the following output from print:
bytearray(b'{"ssid": "mySSID", "password": "myPASSWD"}\xff\xff\xff\xff\xff\xff')
What I want to do is get the json in buff. What is the correct "Python-way" to do that?
buff is a Python bytes object, as shown by the print output beginning with b'. To convert this into a string you need to decode it.
In standard Python you could use
buff.decode(errors='ignore')
Note that without specifying errors=ignore you would get a UnicodeDecodeError because the \xff bytes aren't valid in the default encoding, which is UTF-8; presumably they're padding and you want to ignore them.
If that works on the ESP8266, great! However this from the MicroPython docs suggests the keyword syntax might not be implemented - I don't have an ESP8266 to test it. If not then you may need to remove the padding characters yourself:
textLength = find(buff, b'\xff')
text = buff[0:textLength].decode()
or simply:
text = buff[0:buff.find(b'\xff')].decode()
If decode isn't implemented either, which it isn't in the online MicroPython interpreter, you can use str:
text = str(buff[0:find(buff, b'\xff')], 'utf-8')
Here you have to specify explicitly that you're decoding from UTF-8 (or whatever encoding you specify).
However if what you're really after is the values encoded in the JSON, you should be able to use the json module to retrieve them into a dict:
import json
j = json.loads(buff[0:buff.find(b'\xff')])
ssid = j['ssid']
password = j['password']

Print formatted string using a std::vector (of variable size)

I use the Magento 2 REST API and want to handle errors that get thrown in C++.
Example error response:
{
"message": "Could not save category: %1",
"parameters": [
"URL key for specified store already exists."
]
}
I can retrieve both of these into a String and std::vector which leaves me with the question:
How am I able to return a String that is formatted by filling the placeholders?
With a fixed size, I could do something along the lines of this
char* buffer = new char[200];
String message = "Could not save category: %1";
std::vector<String> parameters = {"URL key for specified store already exists."};
String result = sprintf(buffer,message.c_str(),parameters[0]);
But, alas, I do not know the size beforehand.
How should I go about doing this? Are there stl functions that could help, should I be using self-written templates(no experience with this), can I convert the std::vector to a va_list, or are there other solutions?
Edit: I didn't notice this asks for a C++ dialect and not Standard C++. Leaving it for now, might be of use for other people.
Nothing standard exists that will do that automatically. That being said, if your interpolation format is just %<number> it might be easy enough to write:
string message = "Could not save category: %1";
std::vector<string> parameters = {"URL key for specified store already exists."};
for (size_t i = 0; i < parameters.size(); ++i) {
const auto interp = "%" + std::to_string(i+1);
const size_t pos = message.find(interp);
if (pos != std::string::npos)
message.replace(pos, interp.size(), parameters[i]);
}
This will put the result in the message string. Of course this implementation is rather limited and not particularly efficient, but again, doing this properly requires a library-sized solution, not SO-answer-sized one.
live version.

F# non-literal printf format strings - how to make them passable as parameters?

I would like to use non-literal strings for the "format" parameter of a logging type function, as shown here:
// You need to make c:\testDir or something similar to run this.....
//
let csvFile = #"c:\testDir\foo.csv"
open System.IO
let writer file (s:string) =
use streamWriter = new StreamWriter(file, true)
streamWriter.WriteLine(s)
// s
let log format = Printf.ksprintf (writer csvFile) format
let oneString format = (Printf.StringFormat<string->string> format)
let format = oneString "(this does not %s)"
//log format "important string"
log "this works %s" "important string"
My first attempt used a literal string, and the above fragment should work fine for you if you create the directory it needs or similar.
After discovering that you can't just "let bind" a format string, I then learned about Printf.StringFormatand more details about Printf.ksprintf, but I am obviously missing something, because I can't get them to work together with my small example.
If you comment out the last line and reinstate its predecessor, you will see a compiler error.
Making the function writer return a string almost helped (uncomment its last line), but that then makes log return a string (which means every call now needs an ignore).
I would like to know how to have my format strings dynamically settable within the type checked F# printf world!
Update
I added the parameter format to log to avoid a value restriction error that happens if log is not later called as it is in my example. I also change fmt to format in oneString.
Update
This is a different question from this one. That question does not show a function argument being passed to Printf.StringFormat (a minor difference), and it does not have the part about Printf.ksprintf not taking a continuation function that returns unit.
I thought I had found a solution with:
let oneString format = (Printf.StringFormat<string->string,unit> format)
this compiles, but there is a runtime error. (The change is the ,unit)

readByteSync - is this behavior correct?

stdin.readByteSync has recently been added to Dart.
Using stdin.readByteSync for data entry, I am attempting to allow a default value and if an entry is made by the operator, to clear the default value. If no entry is made and just enter is pressed, then the default is used.
What appears to be happening however is that no terminal output is sent to the terminal until a newline character is entered. Therefore when I do a print() or a stdout.write(), it is delayed until newline is entered.
Therefore, when operator enters first character to override default, the default is not cleared. IE. The default is "abc", data entered is "xx", however "xxc" is showing on screen after entry of "xx". The "problem" appears to be that no "writes" to the terminal are sent until newline is entered.
While I can find an alternative way of doing this, I would like to know if this is the way readByteSync should or must work. If so, I’ll find an alternative way of doing what I want.
// Example program //
import 'dart:io';
void main () {
int iInput;
List<int> lCharCodes = [];
print(""); print("");
String sDefault = "abc";
stdout.write ("Enter data : $sDefault\b\b\b");
while (iInput != 10) { // wait for newline
iInput = stdin.readByteSync();
if (iInput == 8 && lCharCodes.length > 0) { // bs
lCharCodes.removeLast();
} else if (iInput > 31) { // ascii printable char
lCharCodes.add(iInput);
if (lCharCodes.length == 1)
stdout.write (" \b\b\b\b chars cleared"); // clear line
print ("\nlCharCodes length = ${lCharCodes.length}");
}
}
print ("\nData entered = ${new String.fromCharCodes(lCharCodes).trim()}");
}
Results on Command screen are :
c:\Users\Brian\dart-dev1\test\bin>dart testsync001.dart
Enter data : xxc
chars cleared
lCharCodes length = 1
lCharCodes length = 2
Data entered = xx
c:\Users\Brian\dart-dev1\test\bin>
I recently added stdin.readByteSync and readLineSync, to easier create small scrips reading the stdin. However, two things are still missing, for this to be feature-complete.
1) Line mode vs Raw mode. This is basically what you are asking for, a way to get a char as soon as it's printed.
2) Echo on/off. This mode is useful for e.g. typing in passwords, so you can disable the default echo of the characters.
I hope to be able to implement and land these features rather soon.
You can star this bug to track the development of it!
This is common behavior for consoles. Try to flush the output with stdout.flush().
Edit: my mistake. I looked at a very old revision (dartlang-test). The current API does not provide any means to flush stdout. Feel free to file a bug.

When parsing XML, the character é is missing

I have an XML as input to a Java function that parses it and produces an output. Somewhere in the XML there is the word "stratégie". The output is "stratgie". How should I parse the XML as to get the "é" character as well?
The XML is not produced by myself, I get it as a response from a web service and I am positive that "stratégie" is included in it as "stratégie".
In the parser, I have:
public List<Item> GetItems(InputStream stream) {
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(stream);
doc.getDocumentElement().normalize();
NodeList nodeLst = doc.getElementsByTagName("item");
List<Item> items = new ArrayList<Item>();
Item currentItem = new Item();
Node node = nodeLst.item(0);
if (node.getNodeType() == Node.ELEMENT_NODE) {
Element item = (Element) node;
if(node.getChildNodes().getLength()==0){
return null;
}
NodeList title = item.getElementsByTagName("title");
Element titleElmnt = (Element) title.item(0);
if (null != titleElmnt)
currentItem.setTitle(titleElmnt.getChildNodes().item(0).getNodeValue());
....
Using the debugger, I can see that titleElmnt.getChildNodes().item(0).getNodeValue() is "stratgie" (without the é).
Thank you for your help.
I strongly suspect that either you're parsing it incorrectly or (rather more likely) it's just not being displayed properly. You haven't really told us anything about the code or how you're using the result, which makes it hard to give very concrete advice.
As ever with encoding issues, the first thing to do is work out exactly where data is getting lost. Lots of logging tends to be the way forward: create a small test case that demonstrates the problem (as small as you can get away with) and log everything about the data. Don't just try to log it as raw text: log the Unicode value of each character. That way your log will have all the information even if there are problems with the font or encoding you use to view the log.
The answer was here: http://www.yagudaev.com/programming/java/7-jsp-escaping-html
You can either use utf-8 and have the 'é' char in your document instead of é, or you need to have a parser that understand this entity which exists in HTML and XHTML and maybe other XML dialects but not in pure XML : in pure XML there's "only" ", <, > and maybe &apos; I don't remember.
Maybe you can need to specify those special-char entities in your DTD or XML Schema (I don't know which one you use) and tell your parser about it.

Resources