I have a string:
html = 'class="repository-content frozen interface">'
CSS rules repository-content can be located anywhere:
html = 'class="frozen repository-content interface">'
Help me create a regular expression to delete all classes except repository-content
my version:
html.gsub(/[^\s?forbidden-word\s?]/, '')
But it doesn't work
I think it's better to think this the other way around (basically whitelisting instead of blacklisting). If you know the word you want to keep you can do this:
irb(main):001:0> html = 'class="frozen repository-content-2 repository-content interface">'
=> "class=\"frozen repository-content-2 repository-content interface\">"
irb(main):002:0> html.gsub(/class=".*?(repository-content(?=[\s"])).*?"/, 'class="\1"')
=> "class=\"repository-content\">"
irb(main):003:0>
So the idea here is to keep the word you want and remove surrounding (forbidden) words.
.*? this is to match 0 or more characters (non-greedy)
\1 is to keep the word you want inside the class="..."
Related
I'm using stringscanner on my request URL in order to get the name of the user's currently selected category, but I've been having difficulty dealing with spaces and special characters.
request.url.scan(/\?category=\w+/).to_s.gsub('?category=', '')
URL examples followed by result
http://localhost:3000/search?category=dog&search=&utf8=%E2%9C%93 => ["dog"]
http://localhost:3000/search?category=dog.com&search=&utf8=%E2%9C%93 => ["dog"]
http://localhost:3000/search?category=dog+cat&search=&utf8=%E2%9C%93 => ["dog"]
I'm trying to get ["dog"] ["dog.com"] and ["dog cat"], but am currently stuck. Any ideas?
Note: Considering removing spaces from categories and replacing them with dashes as multiple spaces could be problematic, but if it's possible to create one function to rule them all, that would be awesome.
This is Rails, is there a reason you're not just using params[:category]?
If you are trying to extract params then you could use parse_query :
uri = "http://localhost:3000/search?category=dog+cat&search=&utf8=%E2%9C%93"
result = Rack::Utils.parse_query(URI(uri).query) #=> {"category"=>"dog cat", "search"=>"", "utf8"=>"\xE2\x9C\x93"}
result["category"] #=> dog cat
I have an automated report tool (corp intranet) where the admins have a few text area boxes to enter some text for different parts of the email body.
What I'd like to do is parse the contents of the text area and wrap any hyperlinks found with link tags (so when the report goes out there are links instead of text urls).
Is ther a simple way to do something like this without figuring out a way of parsing the text to add link tags around a found (['http:','https:','ftp:] TO the first SPACE after)?
Thank You!
Ruby 1.87, Rails 2.3.5
Make a helper :
def make_urls(text)
urls = %r{(?:https?|ftp|mailto)://\S+}i
html_text = text.gsub urls, '\0'
html_text
end
on the view just call this function , you will get the expected output.
like :
irb(main):001:0> string = 'here is a link: http://google.com'
=> "here is a link: http://google.com"
irb(main):002:0> urls = %r{(?:https?|ftp|mailto)://\S+}i
=> /(?:https?|ftp|mailto):\/\/\S+/i
irb(main):003:0> html = string.gsub urls, '\0'
=> "here is a link: http://google.com"
There are many ways to accomplish your goal. One way would be to use Regex. If you have never heard of regex, this wikipedia entry should bring you up to speed.
For example:
content_string = "Blah ablal blabla lbal blah blaha http://www.google.com/ adsf dasd dadf dfasdf dadf sdfasdf dadf dfaksjdf kjdfasdf http://www.apple.com/ blah blah blah."
content_string.split(/\s+/).find_all { |u| u =~ /^https?:/ }
Which will return: ["http://www.google.com/", "http://www.apple.com/"]
Now, for the second half of the problem, you will use the array returned above to subsititue the text links for hyperlinks.
links = ["http://www.google.com/", "http://www.apple.com/"]
links.each do |l|
content_string.gsub!(l, "<a href='#{l}'>#{l}</a>")
end
content_string will now be updated to contain HTML hyperlinks for all http/https URLs.
As I mentioned earlier, there are numerous ways to tackle this problem - to find the URLs you could also do something like:
require 'uri'
URI.extract(content_string, ['http', 'https'])
I hope this helps you.
I'm importing an RSS feed which has a series of empty paragraphs "<p> </p>".
I am using gsub however it's not stripping the elements from the document:
document.gsub(/<p>\s*<\/p>/,"") or gsub(/<p> <\/p>/,"")
Is there an alternative method or a mistake in the above?
The below appears to work?
gsub(/<p>.<\/p>/,"")
Correct regex like in example:
>> document = "<p>\n\n\n \n</p>aaa<p> </p>bbb"
=> "<p>\n\n\n \n</p>aaa<p> </p>bbb"
>> document.gsub(/<p>[\s$]*<\/p>/, '')
=> "aaabbb"
If the paragraph elements in your RSS feed uses id and classes try this:
gsub(/\<p(\s((class)|(id))=[\'\"][A-z0-9\s]+[\'\"]\s*)*\>\s*\<\/p\>/,"")
Quick background: I have a string which contains references to other pages. The pages are linked to using the format: "#12". A hash followed by the ID of the page.
Say I have the following string:
str = 'This string links to the pages #12 and #125'
I already know the IDs of the pages that need linking:
page_ids = str.scan(/#(\d*)/).flatten
=> [12, 125]
How can I loop through the page ids and link the #12 and #125 to their respective pages? The problem I've run into is if I do the following (in rails):
page_ids.each do |id|
str = str.gsub(/##{id}/, link_to("##{id}", page_path(id))
end
This works fine for #12 but it links the "12" part of #125 to the page with ID of 12.
Any help would be awesome.
Instead of extracting the ids first and then replacing them, you can simply find and replace them in one go:
str = str.gsub(/#(\d*)/) { link_to("##{$1}", page_path($1)) }
Even if you can't leave out the extraction step because you need the ids somewhere else as well, this should be much faster, since it doesn't have to go through the entire string for each id.
PS: If str isn't referred to from anywhere else, you can use str.gsub! instead of str = str.gsub
if your indexes always end at word boundaries, you can match that:
page_ids.each do |id|
str = str.gsub(/##{id}\b/, link_to("##{id}", page_path(id))
end
you only need to add the word boundary symbol \b on the search pattern, it is not necessary for the replacement pattern.
I want to remove all images from a HTML page (actually tinymce user input) which do not meet certain criteria (class = "int" or class = "ext") and I'm struggeling with the correct approach. That's what I'm doing so far:
hbody = Hpricot(input)
#internal_images = hbody.search("//img[#class='int']")
#external_images = hbody.search("//img[#class='ext']")
But I don't know how to find images where the class has the wrong value (not "int" or "ext").
I also have to loop over the elements to check other attributes which are not standard html (I use them for setting internal values like the DB id, which I set in the attribute dbsrc). Can I access these attributes too and is there a way to remove certain elements (which are in the hpricot search result) when they don't meet my criteria?
Thanks for your help!
>> doc = Hpricot.parse('<html><img src="foo" class="int" /><img src="bar" bar="42" /><img src="foobar" class="int"></html>')
=> #<Hpricot::Doc {elem <html> {emptyelem <img class="int" src="foo">} {emptyelem <img src="bar" bar="42">} {emptyelem <img class="int" src="foobar">} </html>}>
>> doc.search("img")[1][:bar]
=> "42"
>> doc.search("img") - doc.search("img.int")
=> [{emptyelem img src"bar" bar"42"}]
Once you have results from search you can use normal array operations. nonstandard attributes are accessible through [].
Check out the not CSS selector.
(hbody."img:not(.int)")
(hbody."img:not(.ext)")
Unfortunately, it doesn't seem you can concat not expressions. You might want to fetch all img nodes and remove those where the .css selector doesn't include neither .int nor .ext.
Additionally, you could use the difference operator to calculate which elements are not part of both collections.
Use the .remove method to remove nodes or elements: Hpricot Altering documentation.