Find all URL's in a string in rails 3 - ruby-on-rails

I'm building an app in Rails 3 and I need a method to extract all urls from a string and store them in a hash or something. I know I need to use regular expressions but I don't know where exactly to begin with them.
Also, I know about auto_link, but it doesn't quite do what I'm trying to achieve. I just simply need a hash of all the url's from a string.
Thanks!

From http://www.regular-expressions.info/ruby.html
"To collect all regex matches in a string into an array, pass the regexp object to the string's scan() method, e.g.: myarray = mystring.scan(/regex/)."
So you probably need strings that start with "http". So check the docs for that :)

I don't program in Ruby and I'm not very good with regex but maybe this will help you out:
http://www.ozzu.com/programming-forum/url-regex-t104809.html

Related

Need some help regarding .net mvc URL routing/rewriting

URL is something like
/home/rawstring13245/rawstring534533453
I want the rule that saves only 13245 to parameter and 534533454 to another but ignore raw strings before them.
how to achieve it in route.config file?
i want this because strings are not parameters , I need only parameters out of string
like:
url:"{controller}/rawstring{action}/rawstring{id}",
what to enter in place of raw string? I don't need those strings. and yeah each raw string is of same length= 10
You can get complete variables as strings and get your desired part by using the sub-string method. Its the quickest solution, i thought it solve your problem.

extract information from particular string

I am new to ruby on rails. I have passed parameters ISDCode,AreaCode and Telephone number using POST from a form.
I have a string with information of the format countryName(ISDCode) passed in the variable ISDCode. For example "United States of America(+1)".
Now I want to save only the value of the ISDCode in the database.
What would be the ideal way to extract the ISD Code from the string?
Should I extract the ISD Code in Javascript before user POSTs the form or should I extract it in the model using a callback ?
Also is regex the only way to extract the information?
Since the string is from auto completion, the ISDcodes should be existing in your database. So the best solution may be including an extra parameter (with a hidden input), like isdcode_id, then you simply use isdcode_id in your model. This way you can avoid the trouble to parse the string.
If this is not feasible, regex could be the best way to extract the information. You can override the setter in the model to do it.
use regular expression to match ISDcode
"United States of America(+1)" =~ /(\+[\d]+)/
puts $1
If you are interested in getting just the ISD Code alone, this should work:
"United States of America(+1)".gsub!(/[^\+\d]/, "")
NB: You can have this in your view helper and just call the helper on the string before persistence
Already answered, but I'd like to offer an alternative to getting the ISD Code:
isd = "United States(+1)"
puts isd[/[+]*[\d]{1,4}/] # +1
This regexp matches:
0001
+1
+01
etc.
I prefer to use js to extract information in the client side and make a validation in the model. By this way, you can get what you want and make sure it's correct.

Clean up & style characters from text

I am getting text from a feed that has alot of characters like:
Insignia™ 2.0 Stereo Computer Speaker System (2-Piece) - Black
4th-Generation AppleĀ® iPodĀ® touch
Is there an easy way to get rid of these, or do I have to anticipate which characters I want to delete and use the delete method to remove them? Also, when I try to remove
&
with
str.delete("&")
It leaves behind "amp;" Is there a better way to delete this type of character? Do I need to re-encode the text?
String#delete is certainly not what you want, as it works on characters, not the string as a whole.
Try
str.gsub /&/, ""
You may also want to try replacing the & with a literal ampersand, such as:
str.gsub /&/, "&"
If this is closer to what you really want, you may get the best results unescaping the HTML string. If so try this:
CGI::unescapeHTML(str)
Details of the unescapeHTML method are here.
If you are getting data from a 'feed', aka RSS XML, then you should be using an XML parser like Nokogiri to process the XML. This will automatically unescape HTML entities and allow you to get the proper string representation directly.
For removing try to use gsub method, something like this:
text = "foo&bar"
text.gsub /\b&\b/, "" #=> foobar

What is the proper way to sanitize user input when using a Ruby system call?

I have a Ruby on Rails Application that is using the X virtual framebuffer along with another program to grab images from the web. I have structured my command as shown below:
xvfb-run --server-args=-screen 0 1024x768x24 /my/c++/app #{user_provided_url}
What is the best way to make this call in rails with the maximum amount of safety from user input?
You probably don't need to sanitize this input in rails. If it's a URL and it's in a string format then it already has properly escaped characters to be passed as a URL to a Net::HTTP call. That said, you could write a regular expression to check that the URL looks valid. You could also do the following to make sure that the URL is parse-able:
uri = URI.parse(user_provided_url)
You can then query the object for it's relevant parts:
uri.path
uri.host
uri.port
Maybe I'm wrong, but why don't you just make sure that the string given is really an URL (URI::parse), surround it with single quotes and escape any single quote (') character that appears inside?

Regular expression not working when put in an object

I'm trying to store regexes in a database but they're not working when used in a .sub(), even though the same regex works when used directly in .sub() as a string.
regex = Class.object.field // Class.object is an active record containing "\w*\s\/\s"
mystring = "first / second"
mystring.sub(/#{regex}/, '')
// => nil
mystring.sub(/\w*\s\/\s/, '')
// => second
Any insight appreciated!
Thanks,
Matt.
Editing to correct class/object terminology (thanks) & correcting my 2nd example as I had shown #{} wrapped around the working regex (cut & paste SNAFU).
To answer your question: It is not quite what kind of thing your Class.object is. If it's an ActiveRecord, it won't work.
Edit: You obviously found that the problem is Rails escaping the regexp.
An ActiveRecord cannot "contain" your regular expression directly; the regexp will be in one of the fields of your record. In which case you'd want to do something like regexp = Class.object.field_containing_the_regexp.
Even if that is not the case, I suspect that the problem is that your regexp is something other than a string. You can quickly test this by using
puts "My regexp: #{regexp}"
The string that you will see in the output will be the one that is used for the regexp.
A String is not a Regexp. You have to create a Regexp object first.
regex = Regexp.new("\w*\s\/\s")
Turns out my regexp didn't cater for all cases - \w didn't account for symbols. After checking in rails console, and seeing the screwey escaping I was alreasdy half-way down the wrong track.
Thanks for the help.

Resources