Validate and clean up JSON in Omnis Studio? - omnis-studio-8

A correct JSON is
[{
"user_id": 1,
"user_name": "John Doe I"
}, {
"user_id": 2,
"user_name": "Jane Doe III"
}]
But if it contains some illegal characters it will not validate. Like this with a CR after Doe:
[{"user_id":1,"user_name":"John Doe
I"},{"user_id":2,"user_name":"Jane Doe III"}]
My question is if there is a "clean up" function in Omnis Studio 8 where the output is a correct JSON?
EDIT
To replace or delete KNOWN characters is easy. The problem is that text copied from MS Word and the Web can contain UNKNOWN characters. So I am searching for a command like
Calculate VALIDJSON as keepvalidchar(NOTVALIDJSON)
Is there such a beast?

according to this post How do I handle newlines in JSON?
if you want newlines in a text string, you need to escape the \n. so \n in a the original json string should be valid replacement for a 'cr'. eg.
[{"user_id":1,"user_name":"John Doe\nI"},{"user_id":2,"user_name":"Jane Doe III"}]
My read on this is that if you are pulling the data out of a database to send some place on the web, you might want to escape the newline to retain it in the text, rather than altering the JSON after the fact. Of course, that depends on the purpose of the API call to the server receiving the json.
I guess the crux is
if you want the json to reflect the original text content including special characters, you escape it
if you don't, then you remove it. I'd probably clean the original source since I might want to leave the json formatted and doing a replace of \n for blank .. might not have the effect you want.

Are you simply looking for a
do replaceall(lcVar,kCR,'\n') Returns lcVar
function call?
Thus replacing the CR character with the escaped normal encoding of '\n'
I guess the other question is, are you creating the JSON to send, or receiving the JSON and trying to decode it?
If receiving, maybe OJSON.$formatjson() may help?

Related

How to convert a formatted string into plain text

User copy paste and send data in following format: "𝕛𝕠𝕧π•ͺ π••π•–π•“π•“π•šπ•–"
I need to convert it into plain txt (we can say ascii chars) like 'jovy debbie'
It comes in different font and format:
ex:
'π‘±π’†π’π’Šπ’„π’‚ π‘«π’–π’ˆπ’π’”'
'π™ΆπšŽπšŸπš’πšŽπš•πš’πš— π™½πš’πšŒπš˜πš•πšŽ π™»πšžπš–πš‹πšŠπš'
Any Help will be Appreciated, I already refer other stack overflow question but no luck :(
Those letters are from the Mathematical Alphanumeric Symbols block.
Since they have a fixed offset to their ASCII counterparts, you could use tr to map them, e.g.:
"𝕛𝕠𝕧π•ͺ π••π•–π•“π•“π•šπ•–".tr("𝕒-𝕫", "a-z")
#=> "jovy debbie"
The same approach can be used for the other styles, e.g.
"π‘±π’†π’π’Šπ’„π’‚ π‘«π’–π’ˆπ’π’”".tr("𝒂-𝒛𝑨-𝒁", "a-zA-Z")
#=> "Jenica Dugos"
This gives you full control over the character mapping.
Alternatively, you could try Unicode normalization. The NFKC / NFKD forms should remove most formatting and seem to work for your examples:
"𝕛𝕠𝕧π•ͺ π••π•–π•“π•“π•šπ•–".unicode_normalize(:nfkc)
#=> "jovy debbie"
"π‘±π’†π’π’Šπ’„π’‚ π‘«π’–π’ˆπ’π’”".unicode_normalize(:nfkc)
#=> "Jenica Dugos"

How to remove ANSI codes from a string?

I am working on string manipulation using LUA and having trouble with the following problem.
Using this as an example of the original data I am given -
"[0;1;36m(Web): You say, "Text here."[0;37m"
I want to keep the string intact except for removing the ANSI codes.
I have been pointed toward using gsub with the LUA pattern matching but I cannot seem to get the pattern correct. I am also unsure how to reference exactly the escape character sent.
text:gsub("[\27\[([\d\;]+)m]", "")
or
text:gsub("%x%[[%d+;+]m", "")
If successful, all I want to be left with, using the above example, would be:
(Web): You say, "Text here."
Your string example is missing the escape character, ASCII 27.
Here's one way:
s = '\x1b[0;1;36m(Web): You say, "Text here."\x1b[0;37m'
s = s:gsub('\x1b%[%d+;%d+;%d+;%d+;%d+m','')
:gsub('\x1b%[%d+;%d+;%d+;%d+m','')
:gsub('\x1b%[%d+;%d+;%d+m','')
:gsub('\x1b%[%d+;%d+m','')
:gsub('\x1b%[%d+m','')
print(s)

Regular Expression Assistance (RegEx)

I'm trying to create a regular expression string that will capture the data between the opening and closing [] brackets and include the brackets from the following data:
data: [{"LOTS OF DATA}],
datatype: "local",
So far I'm using a regEx string "data:(.*)" and this is returning:
[{"LOTS OF DATA}],
This is almost correct but includes the ',' and the reason this is working is because theres a newline or carriage return before 'datatype:' So I have two questions:
How do I capture all characters including the newline & carriage return?
How do I match the ', datatype:' string. The issue with this is that I cannot guarantee the character type and number of characters between the ',' and 'datatype:' string, I need a wild card? The regEx string would look something like "data:(.*),???datatype:" where ??? is the wildcard?
Thanks for your help, this will be used within an iOS application.
data:\s*\[([^\[\]]*)\]\s*,\s*datatype:
This implies that no square brackets may occur within LOTS OF DATA.
You could even spare the trailing 'datatype:' match.
Should LOTS OF DATA contains square brackets you would have to come up with a more precise specification of its content.

How can i parse the standard input with the erlang api?

I'm developing a game in Erlang, and now i need to read the standard input. I tried the following calls:
io:fread()
io:read()
The problem is that i can't read a whole string, when it contains white spaces. So i have the following questions:
How can i read the string typed from the user when he press the enter key? (remember that the string contains white spaces)
How can i convert a string like "56" in the number 56?
Read line
You can use io:get_line/1 to get string terminated by line feed from console.
3> io:get_line("Prompt> ").
Prompt> hello world how are you?
"hello world how are you?\n"
io:read will get you erlang term, so you can't read a string, unless you want to make your users wrap string in quotes.
Patterns in io:fread does not seem to let you read arbitrary length string containing spaces.
Parse integer
You can convert "56" to 56 using erlang:list_to_integer/1.
5> erlang:list_to_integer("56").
56
or using string:to_integer/1 which will also return you the rest of a string
10> string:to_integer("56hello").
{56,"hello"}
11> string:to_integer("56").
{56,[]}
The erlang documentation about io:fread/2 should help you out.
You can use field lengths in order to read an arbitrary length of characters (including whitespace):
io:fread("Prompt> ","~20c").
Prompt> This is a sentence!!
{ok,["This is a sentence!!"]}
As for converting a string (a list of characters) to an integer, erlang:list_to_integer/1 does the job:
7> erlang:list_to_integer("645").
645
Edit: try experimenting with io:fread/2, the format sequence can ease the parsing of data by applying some form of pattern matching:
9> io:fread("Prompt> ","~s ~s").
Prompt> John Doe
{ok,["John","Doe"]}
The console is not really a good place to do your stuff, because you need to know in advance the format of the answer. Considering that you allow spaces, you need to know how many words will be entered before getting the answer. Knowing that, you can use a string as entry, and then parse it later:
1> io:read("Enter a text > ").
Enter a text > "hello guy, this is my answer :o)".
{ok,"hello guy, this is my answer :o)"}
2>
The bad news is that the user must enter the quotes and a final dot, not user friendly...

removing whitespaces in ActionScript 2 variables

let's say that I have an XML file containing this :
<description><![CDATA[
<h2>lorem ipsum</h2>
<p>some text</p>
]]></description>
that I want to get and parse in ActionScript 2 as HTML text, and setting some CSS before displaying it. Problem is, Flash takes those whitespaces (line feed and tab) and display it as it is.
<some whitespace here>
lorem ipsum
some text
where the output I want is
lorem ipsum
some text
I know that I could remove the whitespaces directly from the XML file (the Flash developer at my workplace also suggests this. I guess that he doesn't have any idea on how to do this [sigh]). But by doing this, it would be difficult to read the section in the XML file, especially when lots of tags are involved and that makes editing more difficult.
So now, I'm looking for a way to strip those whitespaces in ActionScript. I've tried to use PHP's str_replace equivalent (got it from here). But what should I use as a needle (string to search) ? (I've tried to put in "\t" and "\r", don't seem to be able to detect those whitespaces).
edit :
now that I've tried to throw in newline as a needle, it works (meaning that newline successfully got stripped).
mystring = str_replace(newline, '', mystring);
But, newlines only got stripped once, meaning that in every consecutive newlines, (eg. a newline followed by another newline) only one newline can be stripped away.
Now, I don't see that this as a problem in the str_replace function, since every consecutive character other than newline get stripped away just fine.
Pretty much confused about how stuff like this is handled in ActionScript. :-s
edit 2:
I've tried str_replace -ing everything I know of, \n, \r, \t, newline, and tab (by pressing tab key). Replacing \n, \r, and \t seem to have no effect whatsoever.
I know that by successfully doing this, my content can never have real line breaks. That's exactly my intention. I could format the XML the way I want without Flash displaying any of the formatting stuff. :)
Several ways to approach this. Perhaps the simplest answer is, in one sense your Flash developer is probably right, and you should move your whitespace outside of the CDATA container. The reason being, many people (me at least) tend to assume that everything inside a CDATA is "real data", as opposed to markup. On the other hand, whitespace outside a CDATA is normally assumed to be irrelevant, so data like this:
<description>
<![CDATA[<h2>lorem ipsum</h2>
<p>some text</p>]]>
</description>
would be easier to understand and to work with. (The flash developer can use the XML.ignoreWhite property to ignore the whitespace outside the CDATA.)
With that said, if you're editing the XML by hand, then I can see why it would be easier to use the formatting you describe. However, if the extra whitespace is inside the CDATA, then it will inevitable be included in the String data you extract, so your only option is to grab the content of the CDATA and remove the whitespace afterwards.
Then your question reduces to "how do I strip leading/trailing whitespace from a String in AS2?". And unfortunately, since AS2 doesn't support RegEx there's no simple way to do this. I think your best option would be to parse through from the beginning and end to find the first/last non-white character. Something along these lines (untested pseudocode):
myString = stuffFromXML;
whitespace = " " + "\t" + "\n" + "\r" + newline;
start = 0;
end = myString.length;
while ( testString( myString.substr(start,1), whitespace ) ) { start++; }
while ( testString( myString.substr(end-1,1), whitespace ) ) { end--; }
trimmedString = myString.substring( start, end );
function testString( needle, haystack ) {
return ( haystack.indexOf( needle ) > -1 );
}
Hope that helps!
Edit: I notice that in your example you'd also need to remove tabs and whitespace within your text data. This would be tricky, unless you can guarantee that your data will never include "real" tabs in addition to the ones for formatting. No matter what you do with the CDATA tags, it would probably be wiser not to insert extraneous formatting inside your real content and then remove it programmatically afterward. That's just making your own life difficult.
Second edit: As for what character to remove to get rid of newlines, it depends partially on what characters are actually in the XML to begin with (which probably depends on what OS is running where the file is generated), and partially on what character the client machine (that's showing the flash) considers a newline. Lots of gory details here. In practice though, if you remove \r, \n, and \r\n, that usually does the trick. That's why I added both \r and \n to the "whitespace" string in my example code.
its been a while since I've tinkered with AS2.
someXML = new XML();
someXML.ignoreWhite = true;
if you wanted to str_replace try '\n'
Is there a reason that you are using cdata? Admittedly I have no idea what the best practice for this sort of this is, but I tend to leave them out and just have the HTML sit there inside the node.
var foo = node.childnodes.join("") parses it out just fine and I never seem to come across these whitespace problems.
I'm reading this over and over again, and if I'm interpreting you right, all you want to know how to do is strip certain characters (tabs and newlines) from a string in AS2, right? I cannot believe no one has given you the simple one line answer yet:
myString = myString.split("\n").join("");
That's it. Repeat that for \r, \n, and \t and all newlines and tabs will be gone. If you want it as an easy function, then do this:
function stripWhiteSpace(str: String) : String
{
return str.split("\r").join("").split("\n").join("").split("\t").join("");
}
That function won't modify your old string, it will return a new one without \r, \n, or \t. To actually modify the old string use that function like this:
myString = stripWhiteSpace(myString);

Resources