I'm importing an RSS feed which has a series of empty paragraphs "<p> </p>".
I am using gsub however it's not stripping the elements from the document:
document.gsub(/<p>\s*<\/p>/,"") or gsub(/<p> <\/p>/,"")
Is there an alternative method or a mistake in the above?
The below appears to work?
gsub(/<p>.<\/p>/,"")
Correct regex like in example:
>> document = "<p>\n\n\n \n</p>aaa<p> </p>bbb"
=> "<p>\n\n\n \n</p>aaa<p> </p>bbb"
>> document.gsub(/<p>[\s$]*<\/p>/, '')
=> "aaabbb"
If the paragraph elements in your RSS feed uses id and classes try this:
gsub(/\<p(\s((class)|(id))=[\'\"][A-z0-9\s]+[\'\"]\s*)*\>\s*\<\/p\>/,"")
Related
I have a string:
html = 'class="repository-content frozen interface">'
CSS rules repository-content can be located anywhere:
html = 'class="frozen repository-content interface">'
Help me create a regular expression to delete all classes except repository-content
my version:
html.gsub(/[^\s?forbidden-word\s?]/, '')
But it doesn't work
I think it's better to think this the other way around (basically whitelisting instead of blacklisting). If you know the word you want to keep you can do this:
irb(main):001:0> html = 'class="frozen repository-content-2 repository-content interface">'
=> "class=\"frozen repository-content-2 repository-content interface\">"
irb(main):002:0> html.gsub(/class=".*?(repository-content(?=[\s"])).*?"/, 'class="\1"')
=> "class=\"repository-content\">"
irb(main):003:0>
So the idea here is to keep the word you want and remove surrounding (forbidden) words.
.*? this is to match 0 or more characters (non-greedy)
\1 is to keep the word you want inside the class="..."
I'm using stringscanner on my request URL in order to get the name of the user's currently selected category, but I've been having difficulty dealing with spaces and special characters.
request.url.scan(/\?category=\w+/).to_s.gsub('?category=', '')
URL examples followed by result
http://localhost:3000/search?category=dog&search=&utf8=%E2%9C%93 => ["dog"]
http://localhost:3000/search?category=dog.com&search=&utf8=%E2%9C%93 => ["dog"]
http://localhost:3000/search?category=dog+cat&search=&utf8=%E2%9C%93 => ["dog"]
I'm trying to get ["dog"] ["dog.com"] and ["dog cat"], but am currently stuck. Any ideas?
Note: Considering removing spaces from categories and replacing them with dashes as multiple spaces could be problematic, but if it's possible to create one function to rule them all, that would be awesome.
This is Rails, is there a reason you're not just using params[:category]?
If you are trying to extract params then you could use parse_query :
uri = "http://localhost:3000/search?category=dog+cat&search=&utf8=%E2%9C%93"
result = Rack::Utils.parse_query(URI(uri).query) #=> {"category"=>"dog cat", "search"=>"", "utf8"=>"\xE2\x9C\x93"}
result["category"] #=> dog cat
I have an automated report tool (corp intranet) where the admins have a few text area boxes to enter some text for different parts of the email body.
What I'd like to do is parse the contents of the text area and wrap any hyperlinks found with link tags (so when the report goes out there are links instead of text urls).
Is ther a simple way to do something like this without figuring out a way of parsing the text to add link tags around a found (['http:','https:','ftp:] TO the first SPACE after)?
Thank You!
Ruby 1.87, Rails 2.3.5
Make a helper :
def make_urls(text)
urls = %r{(?:https?|ftp|mailto)://\S+}i
html_text = text.gsub urls, '\0'
html_text
end
on the view just call this function , you will get the expected output.
like :
irb(main):001:0> string = 'here is a link: http://google.com'
=> "here is a link: http://google.com"
irb(main):002:0> urls = %r{(?:https?|ftp|mailto)://\S+}i
=> /(?:https?|ftp|mailto):\/\/\S+/i
irb(main):003:0> html = string.gsub urls, '\0'
=> "here is a link: http://google.com"
There are many ways to accomplish your goal. One way would be to use Regex. If you have never heard of regex, this wikipedia entry should bring you up to speed.
For example:
content_string = "Blah ablal blabla lbal blah blaha http://www.google.com/ adsf dasd dadf dfasdf dadf sdfasdf dadf dfaksjdf kjdfasdf http://www.apple.com/ blah blah blah."
content_string.split(/\s+/).find_all { |u| u =~ /^https?:/ }
Which will return: ["http://www.google.com/", "http://www.apple.com/"]
Now, for the second half of the problem, you will use the array returned above to subsititue the text links for hyperlinks.
links = ["http://www.google.com/", "http://www.apple.com/"]
links.each do |l|
content_string.gsub!(l, "<a href='#{l}'>#{l}</a>")
end
content_string will now be updated to contain HTML hyperlinks for all http/https URLs.
As I mentioned earlier, there are numerous ways to tackle this problem - to find the URLs you could also do something like:
require 'uri'
URI.extract(content_string, ['http', 'https'])
I hope this helps you.
I'm writing a site with a custom tweet button that uses the www.twitter.com/share function, however the problem I am having is including hash '#' characters within the tweet text.
For example:
http://www.twitter.com/share?url=www.example.com&text=I+am+eating+#branstonpickel+right+now
The tweet text comes out as 'I am eating' and omits the hash and everything after.
I had a quick look on the Twitter forums and learnt the hash '#' character cannot be part of the share url. On https://dev.twitter.com/discussions/512#comment-877 it was said that:
Hashes are special characters in the URL (they identify document fragments) so they, and anything following, does not get sent the server.
and
you need to URLEncode it, so use %23
When I tried the 2nd point in my test link:
www.twitter.com/share?url=www.example.com&text=I+am+eating+%23branstonpickel+right+now
The tweet text came out as 'I am eating %23branstonpickel right now' literally including %23 instead of converting it to a hash.
Sorry for the waffely question, but does anyone know what it is I'm doing wrong?
Any feedback would be greatly appreciated :)
It looks like this is the basic setup:
https://twitter.com/intent/tweet?
url=<url to tweet>
text=<text to tweet>
hashtags=<comma separated list of hashtags, with no # on them>
This would pre-built a tweet of: <text> <url> <hashtags>
The above example would be:
https://twitter.com/intent/tweet?url=http://www.example.com&text=I+am+eating+branston+pickel+right+now&hashtags=bransonpickel,pickles
There used to be a bug with the hashtags parameter... it only showed the first n-1 hashtags. Currently this is fixed.
you can use %23 instead of hash (#) in url eg
http://www.twitter.com/share?url=www.example.com&text=I+am+eating+%23branston+%23pickel+right+now
I may be wrong but i think the hashtag has to be passed as a separate variable that will appear at the end of your tweet ie:
http://www.twitter.com/share?url=www.example.com&text=I+am+eating+branston+pickel+right+now&hashtag=bransonpickel
will result in "I am eating branston pickel right now #branstonpickle"
On a separate note, I think pickel should be pickle!
Cheers
Toby
use encodeURIComponent to encode the url
If you're using PHP, you can use the following:
<?php echo 'http://www.twitter.com/share?' . http_build_query(array(
'url' => 'http://www.example.com',
'text' => 'I am eating #branstonpickel right now'
)); ?>
This will do all the URL encoding for you, and it's easy to read.
For more information on the http_build_query, see the PHP manual:
http://us2.php.net/http_build_query
For url with line jump, # , # and special unicode in it, the following works :
var lineJump = encodeURI(String.fromCharCode(10)),
hash = "%23", arobase="%40",
tweetText = 'https://twitter.com/intent/tweet?text=Le signe chinois '+hans+' '+item.pinyin+': '+item.definition.replace(";",",")+'.'
+lineJump+'Merci '+arobase+'Inalco_Officiel '+arobase+'CRIparis ❤️🇨🇳 '
+lineJump+hash+'Chinois '+hash+'MOOC'
+lineJump+'https://hanzi.cri-paris.org/',
tweetTxtUrlEncoded = tweetText+ "" +encodeURIComponent('#'+lesson+encodeURIComponent(hans));
urlencode
https://twitter.com/intent/tweet?text=<?= urlencode("I am eating #branstonpickel right now"); ?>"
You can just use this code and modify it
20% means space
23% means hashtag
In JS you can easily encode the special characters using encoreURIComponent.
(Warning: don't use encodeURI as "#" and "#" are not escaped.)
Here's an example with mention and hashtag:
const text = "Hello #world ! Go follow #StackOverflow";
const tweetUrl = `https://twitter.com/intent/tweet?text=${ encodeURIComponent(text) }`;
I have a Rails site, where the content is written in markdown. I wish to display a snippet of each, with a "Read more.." link.
How do I go about this? Simple truncating the raw text will not work, for example..
>> "This is an [example](http://example.com)"[0..25]
=> "This is an [example](http:"
Ideally I want to allow the author to (optionally) insert a marker to specify what to use as the "snippet", if not it would take 250 words, and append "..." - for example..
This article is an example of something or other.
This segment will be used as the snippet on the index page.
^^^^^^^^^^^^^^^
This text will be visible once clicking the "Read more.." link
The marker could be thought of like an EOF marker (which can be ignored when displaying the full document)
I am using maruku for the Markdown processing (RedCloth is very biased towards Textile, BlueCloth is extremely buggy, and I wanted a native-Ruby parser which ruled out peg-markdown and RDiscount)
Alternatively (since the Markdown is translated to HTML anyway) truncating the HTML correctly would be an option - although it would be preferable to not markdown() the entire document, just to get the first few lines.
So, the options I can think of are (in order of preference)..
Add a "truncate" option to the maruku parser, which will only parse the first x words, or till the "excerpt" marker.
Write/find a parser-agnostic Markdown truncate'r
Write/find an intelligent HTML truncating function
Write/find an intelligent HTML truncating function
The following from http://mikeburnscoder.wordpress.com/2006/11/11/truncating-html-in-ruby/, with some modifications will correctly truncate HTML, and easily allow appending a string before the closing tags.
>> puts "<p><b>Something</p>".truncate_html(5, at_end = "...")
=> <p><b>Someth...</b></p>
The modified code:
require 'rexml/parsers/pullparser'
class String
def truncate_html(len = 30, at_end = nil)
p = REXML::Parsers::PullParser.new(self)
tags = []
new_len = len
results = ''
while p.has_next? && new_len > 0
p_e = p.pull
case p_e.event_type
when :start_element
tags.push p_e[0]
results << "<#{tags.last}#{attrs_to_s(p_e[1])}>"
when :end_element
results << "</#{tags.pop}>"
when :text
results << p_e[0][0..new_len]
new_len -= p_e[0].length
else
results << "<!-- #{p_e.inspect} -->"
end
end
if at_end
results << "..."
end
tags.reverse.each do |tag|
results << "</#{tag}>"
end
results
end
private
def attrs_to_s(attrs)
if attrs.empty?
''
else
' ' + attrs.to_a.map { |attr| %{#{attr[0]}="#{attr[1]}"} }.join(' ')
end
end
end
Here's a solution that works for me with Textile.
Convert it to HTML
Truncate it.
Remove any HTML tags that got cut in half with
html_string.gsub(/<[^>]*$/, "")
Then, uses Hpricot to clean it up and close unclosed tags
html_string = Hpricot( html_string ).to_s
I do this in a helper, and with caching there's no performance issue.
You could use a regular expression to find a line consisting of nothing but "^" characters:
markdown_string = <<-eos
This article is an example of something or other.
This segment will be used as the snippet on the index page.
^^^^^^^^^^^^^^^
This text will be visible once clicking the "Read more.." link
eos
preview = markdown_string[0...(markdown_string =~ /^\^+$/)]
puts preview
Rather than trying to truncate the text, why not have 2 input boxes, one for the "opening blurb" and one for the main "guts". That way your authors will know exactly what is being show when without having to rely on some sort of funkly EOF marker.
I will have to agree with the "two inputs" approach, and the content writer would need not to worry, since you can modify the background logic to mix the two inputs in one when showing the full content.
full_content = input1 + input2 // perhaps with some complementary html, for a better formatting
Not sure if it applies to this case, but adding the solution below for the sake of completeness. You can use strip_tags method if you are truncating Markdown-rendered contents:
truncate(strip_tags(markdown(article.contents)), length: 50)
Sourced from:
http://devblog.boonecommunitynetwork.com/rails-and-markdown/
A simpler option that just works:
truncate(markdown(item.description), length: 100, escape: false)