Taking json from a website and getting some json results back with escaped characters.
What's the best design pattern for removing the characters we don't like?
Here are some we don't like from here:
\b Backspace (ascii code 08)
\f Form feed (ascii code 0C)
\n New line
\r Carriage return
\t Tab
\v Vertical tab
\' Apostrophe or single quote
\" Double quote
\ Backslash character
But my question is, in general, should you do
it before you save it?
after you save it, convert it right away?
or should you convert it right before you display/use it?
Remove it right after downloading. It's kinda costly (especially if you use regular expressions), so it'll be done once, and not needed again.
Related
I have JSON strings that may contain \n, \t, which I don't want to save into database. strip_tags helps only with simple strings. I am using gsub(/(\\n)|(\\t)/, "").
I wonder if there is another Rails helper method or a better way to achieve this.
e.g
"[{\"type\":\"checkbox-group\",\"label\":\"\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nFill in\\nthe Gap (Please\\nfill in the blank box with correct wor\",\"name\":\"checkbox-group-1527245153706\",\"values\":[{\"label\":\"Option 1\",\"value\":\"option-1\",\"selected\":true}]},{\"type\":\"text\",\"label\":\"\\n\\t\\t\\n\\t\\n\\t\\n\\t\\t\\n\\t\\t\\t\\n\\t\\t\\t\\t\\n\\t\\t\\t\\t\\t\\n\\t\\t\\t\\t\\t\\t\\n\\t\\t\\t\\t\\t\\t\\t\\n\\t\\t\\t\\t\\t\\t\\t\\tWhat are the unique features of\\ne-commerce, digital markets, and\\ndigital goods? \\n\\t\\t\\t\\t\\t\\t\\t\\n\\t\\t\\t\\t\\t\\t\\n\\t\\t\\t\\t\\t\\n\\t\\t\\t\\t\\n\\t\\t\\t\\n\\t\\t\",\"className\":\"form-control\",\"name\":\"text-1527245426509\",\"subtype\":\"text\"}]"
You can make use of squish or squish!
" Some text \n\n and a tab \t\t with new line \n\n ".squish
#=> "Some text and a tab with new line"
Squish removes all the whitespace chars on both ends and grouping remaining whitespace chars (\n, \t, space) in one space
I think this one may help you,
JSON.parse(string).map{ |a| a['label'] = a['label'].squish; a}
Does anyone understand what this (([A-Za-z\\s])+)\\? means?
I wonder why it should be "\\s" and "\\" ?
If I entered "\s", Xcode just doesn't understand and if I entered "\?", it just doesn't match the "?".
I have googled a lot, but I did not find a solution. Anyone knows?
The actual regex is (([A-Za-z\s])+)\?. This matches one or more letters and whitespace characters followed by an question mark. The \ has two different meanings here. In the first instance \s has a fixed meaning and stands for any white space characters. In the second instance the \? means the literal question mark character. The escaping is necessary as the question mark means one or none of the previous otherwise.
You can't type your regex like this in a string literal in C code though. C also does some escaping using the backslash character. For example "\n" is translated to a string containing only a newline character. There are some other escape sequences with special meanings. If the character after the backslash doesn't have a special meaning the backslash is just removed. That means if you want to have a single backspace in your string you have to write two.
So if you wrote your regex string as you wanted you'd get different results as it would be interpreted as (([A-Za-zs])+)? which has a completely different meaning. So when you write a regex in an ObjC (or any other C-based language) string literal you must double all backslash characters.
not sure about ios but same thing happens in java. \ is escape character for java,and c also so when you type \s java reads \ as an escape character.
think of it as if you want to print a \ what will you have to do.
you will have to type \\. now first \ will work as escape character for java and second one will be printed.
I think it should be the same concept for ios too.
so if you want \s you type \s, if you want \ you type \\.
The \s metacharacter is used to find a whitespace character.
Refer this!
Currently I am learning how to work with HL7 and how to parse it in python. Now I was wondering what happens if a value in a HL7 segment contains a pipe sign, e.g. '|'. How is this sign handled? If there is no masking, it would lead to a crash of the HL7 parser. Is there a masking possibility?
\F\
You should read the relevant sections of chapter 2 of the version 2 standard about how escaping works in version 2.
The HL7 structure has defined escape sequences for the separators like |.
When you look at a HL7 message, the used five delimiters are right after the MSH:
MSH|^~\&
| is the Field separator F
^ the component separator S
~ is the repetition separator (for the second level elements) R
\ is the escape character E
& is the sub-component separator T
So to escape one of the special characters like |, you have to take the escape character and then add the defined letter (F,S, etc.)
So in above case, to escape the | you would have to put \F\. Or escaping the escape character is \E\.
If you like you can also change the delimiters after the MSH completely, but I don't recommend that.
I have seen the following on StackOverflow about URL characters:
There are two sets of characters you need to watch out for - Reserved and Unsafe.
The reserved characters are:
ampersand ("&")
dollar ("$")
plus sign ("+")
comma (",")
forward slash ("/")
colon (":")
semi-colon (";")
equals ("=")
question mark ("?")
'At' symbol ("#").
The characters generally considered unsafe are:
space,
question mark ("?")
less than and greater than ("<>")
open and close brackets ("[]")
open and close braces ("{}")
pipe ("|")
backslash ("\")
caret ("^")
tilde ("~")
percent ("%")
pound ("#").
I'm trying to code a URL so I can parse it using delimiters. They can't be numbers or letters though. Does anyone have a list of characters that are NOT Reserved but ARE safe to use?
Thanks for any help you can provide.
Don't bother trying to use safe/unreserved characters. Just use whatever delimiters you want and URLencode the whole thing. Then URL decode it on the other end and parse normally.
Is there a reason you can't just use the standard delimiter for URL parameters (&)? That is the most straightforward way to do it instead of trying to roll your own.
For example the standard URL syntax already allows for multi-valued paramaters natively. This is perfectly legal and doesn't require any trickery.
Somepage.aspx?parameterName=A¶meterName=B
The result is that the page would be passed "A,B" in the parameterName attribute.
Could anybody help me make a proper regular expression from a bunch of text in Ruby. I tried a lot but I don't know how to handle variable length titles.
The string will be of format <sometext>title:"<actual_title>"<sometext>. I want to extract actual_title from this string.
I tried /title:"."/ but it doesnt find any matches as it expects a closing quotation after one variable from opening quotation. I couldn't figure how to make it check for variable length of string. Any help is appreciated. Thanks.
. matches any single character. Putting + after a character will match one or more of those characters. So .+ will match one or more characters of any sort. Also, you should put a question mark after it so that it matches the first closing-quotation mark it comes across. So:
/title:"(.+?)"/
The parentheses are necessary if you want to extract the title text that it matched out of there.
/title:"([^"]*)"/
The parentheses create a capturing group. Inside is first a character class. The ^ means it's negated, so it matches any character that's not a ". The * means 0 or more. You can change it to one or more by using + instead of *.
I like /title:"(.+?)"/ because of it's use of lazy matching to stop the .+ consuming all text until the last " on the line is found.
It won't work if the string wraps lines or includes escaped quotes.
In programming languages where you want to be able to include the string deliminator inside a string you usually provide an 'escape' character or sequence.
If your escape character was \ then you could write something like this...
/title:"((?:\\"|[^"])+)"/
This is a railroad diagram. Railroad diagrams show you what order things are parsed... imagine you are a train starting at the left. You consume title:" then \" if you can.. if you can't then you consume not a ". The > means this path is preferred... so you try to loop... if you can't you have to consume a '"' to finish.
I made this with https://regexper.com/#%2Ftitle%3A%22((%3F%3A%5C%5C%22%7C%5B%5E%22%5D)%2B)%22%2F
but there is now a plugin for Atom text editor too that does this.