I want use this code:
replaceregexp match='app_name">(.*)<'
But char < can't be used.
How can I do it?
When you want to use characters which stand for predefined standard entities in attribute value or text, you must write it as entity.
Predefined entity covers " ' < > &
In your case you have to write
replaceregexp match='app_name">(.*)<'
(The single quot limits the attribute value here, when a single quote appear IN the value you have to use the entity ')
http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
Use < and > for the characters <>.
http://www.jguru.com/faq/view.jsp?EID=721755
Related
Given a string like "Whatup <b>whatever<b> \n", i need to turn that into "Whatup whatever".
I'm pretty close with my below method, but I can't find a good way to remove dynamic & and type codes. I don't want to gsub each out (like i'm doing with the comma) -- There are hundreds of thousands of rows and many different codes in them...blah
Any pointers are welcome.
def self.clean_string(st)
return strip_tags(st).force_encoding("UTF-8").gsub(",","").squish if st and st != ""
end
For the HTML entities, add this regex replacement:
.gsub(/&[^;]+;/, '')
It will remove any &-style entity from the text.
Hello and thank you for reading my post.
The Apache Commons StringEscapeUtils.escapeHtml3() and StringEscapeUtils.escapeHtml4() functions allow, in particular, to convert characters with an acute (like é, à...) in a string into
character entity references which have the format &name; where name is a case-sensitive alphanumeric string.
How can I get the escaped string of a given string with numeric character references instead (&#nnnn; or &#xhhhh; where nnnn is the code point in decimal form, and hhhh is the code point in hexadecimal form)?
I actually need to escape strings for a XML document which doesn't know about such entities as & eacute;, & agrave; etc.
Best regards.
To solve this problem, I wrote a method which takes a string as an argument and replaces, in this string, character entity references (like é) with their corresponding numeric character references (é in this case).
I used this W3C list of references: http://www.sagehill.net/livedtd/xhtml1-transitional/xhtml-lat1.ent.html
Nota: It would be great to be able to pass another argument to the StringEscapeUtils.escapeHtml4() method to tell it whether we would like character entity references or numeric character references in the output string...
Create your CharacterTranslator:
CharacterTranslator XML_ESCAPE = StringEscapeUtils.ESCAPE_XML11.with(
NumericEntityEscaper.between(0x7f, Integer.MAX_VALUE) );
and use it:
XML_ESCAPE.translate(…)
I need to parse vCard 2.1 Name property.
In vCard 3.0 and 4.0 each component of the Name property can have multiple values. For example RFC 2426 clearly states:
Individual text
components can include multiple text values (e.g., multiple
Additional Names) separated by the COMMA character (ASCII decimal
44).
but in vCard 2.1 it looks like each component can have only one value. Even though probably at least Additional Names (third field) can have commas in it.
Can vCard 2.1 Name property have comas in Family Name, Given Name, Additional Names, Name Prefix and Name Suffix strings? Should this be treated as "multiple text values" as in vCard 3.0 and 4.0?
According to the formal BNF definition in the 2.1 specs, a comma-delimited list of values within a component is not supported. The BNF says nothing about having to escape comma characters.
nameparts = 0*4(strnosemi ";") strnosemi
; Family, Given, Middle, Prefix, Suffix.
; Example:Public;John;Q.;Reverend Dr.;III, Esq.
strnosemi = *(*nonsemi ("\;" / "\" CRLF)) *nonsemi
; To include a semicolon in this string, it must be escaped
; with a "\" character.
One of the examples sited for the N property also seems to imply that commas have no special meaning.
N:Veni, Vidi, Vici;The Restaurant.
So, it looks like the N property does not support multiple values in vCard 2.1.
I have a text file which read:
config<001>25
23<220>12
.....
how can i parse so that i need only the values config,001(to be converted into integer after extracting using strtok or any ohter methods please suggest), and 25(to be converted into integer) seperately. i tries strtok its not working as the way i need. Please help me.
Use LINQ 2 SQL to import the file on the delimiters and then use something like AutoMapper to do the mapping of fields to say specific objects with specific types.
I did this exact thing in another project and it works great.
Based on the mention of strtok I'm guessing that you're using C or C++. If you're using C++, I'd probably handle this by creating a ctype facet that treats < and > as white space, which will make the parsing trivial (infile >> string >> number1 >> number2;).
If you're using C, you can use the scan-set conversion with scanf, something like: sscanf(line, "%[^<] %d> %d", string, &number1, &number2);
Could anybody help me make a proper regular expression from a bunch of text in Ruby. I tried a lot but I don't know how to handle variable length titles.
The string will be of format <sometext>title:"<actual_title>"<sometext>. I want to extract actual_title from this string.
I tried /title:"."/ but it doesnt find any matches as it expects a closing quotation after one variable from opening quotation. I couldn't figure how to make it check for variable length of string. Any help is appreciated. Thanks.
. matches any single character. Putting + after a character will match one or more of those characters. So .+ will match one or more characters of any sort. Also, you should put a question mark after it so that it matches the first closing-quotation mark it comes across. So:
/title:"(.+?)"/
The parentheses are necessary if you want to extract the title text that it matched out of there.
/title:"([^"]*)"/
The parentheses create a capturing group. Inside is first a character class. The ^ means it's negated, so it matches any character that's not a ". The * means 0 or more. You can change it to one or more by using + instead of *.
I like /title:"(.+?)"/ because of it's use of lazy matching to stop the .+ consuming all text until the last " on the line is found.
It won't work if the string wraps lines or includes escaped quotes.
In programming languages where you want to be able to include the string deliminator inside a string you usually provide an 'escape' character or sequence.
If your escape character was \ then you could write something like this...
/title:"((?:\\"|[^"])+)"/
This is a railroad diagram. Railroad diagrams show you what order things are parsed... imagine you are a train starting at the left. You consume title:" then \" if you can.. if you can't then you consume not a ". The > means this path is preferred... so you try to loop... if you can't you have to consume a '"' to finish.
I made this with https://regexper.com/#%2Ftitle%3A%22((%3F%3A%5C%5C%22%7C%5B%5E%22%5D)%2B)%22%2F
but there is now a plugin for Atom text editor too that does this.