How to retrieve the 'scheme://domain' part of a generic URL? - ruby-on-rails

I am using Ruby on Rails 3.0.10 and I would like to retrieve the scheme://domain part of a generic URL (note: a URL syntax can be also scheme://domain:port/path?query_string#fragment_id).
That is, for example, if I have the following URLs
http://stackoverflow.com/questions/7304043/how-to-retrieve-the-scheme-domainport-part-of-a-generic-url
ftp://some_link.org/some_other_string
# Consider also 'https', 'ftps' and so on... 'scheme:' values.
I would like to retrieve just the
# Note: The last '/' character was removed.
http://stackoverflow.com
ftp://some_link.org
# Consider also 'https', 'ftps' and so on... 'scheme:' values.
parts. How can I do that?

require 'uri'
uri = URI.parse("http://stackoverflow.com/questions/7304043/how-to-retrieve-the-scheme-domainport-part-of-a-generic-url")
url = "#{uri.scheme}://#{uri.host}" #url would be set to http://stackoverflow.com
From Module: URI.

Related

How to get the hostname from a url with accented letters inside in Ruby

I have the following url inside a field of model:
https://www.reddit.com/r/italy/comments/i6ix3x/trenitalia_sostiene_che_potrà_non_rispettare_il/?sort=new
Inside the URL there is an accented letter (à). If I use URI.parse to get hostname gives me the following error:
URI::InvalidURIError: URI must be ascii only "https://www.reddit.com/r/italy/comments/i6ix3x/trenitalia_sostiene_che_potr\u00E0_non_rispettare_il/?sort=new"
The method URL.encode resolves the problem, but I discover that the URL.encode is obsolete and should not be used.
Which method should I use for replacing URI.encode?
this is encoding issue and you need to do it as below
first lets encode your URI first
encoded_url = URI.encode('https://www.reddit.com/r/italy/comments/i6ix3x/trenitalia_sostiene_che_potrà_non_rispettare_il/?sort=new')
And then parse it
URI.parse(encoded_url)
good luck
The only solution that I find uses the gem Addressable(https://github.com/sporkmonger/addressable):
Addressable::URI.parse('https://www.reddit.com/r/italy/comments/i6ix3x/trenitalia_sostiene_che_potrà_non_rispettare_il/?sort=new').host
Perhaps this could be an inelegant solution:
URI.parse(URI.extract(target.url).first)
# => #<URI::HTTPS https://www.reddit.com/r/italy/comments/i6ix3x/trenitalia_sostiene_che_potr>
Then I use the method host
URI.parse(URI.extract(target.url).first).host
# => "www.reddit.com"

How to validate URIs by checking if those refer to a given domain name?

I am running Ruby on Rails 4.1.1 and I would like to validate URIs by checking if those are within a given domain name. That is, a uri string can be sent as params to my controller and I would like to check if that uri "refers" to my application domain name www.myapp.com. Of course, the uri should be a valid URI reference.
# Invalid URIs
www.website.com
http://www.website.com
http://www.website.com/
https://www.website.com/
ftp://www.website.com/
ftps://www.website.com/
http://www.website.com/some/path
# Valid URIs
www.myapp.com
http://www.myapp.com
http://www.myapp.com/
https://www.myapp.com/
ftp://www.myapp.com/
ftps://www.myapp.com/
http://www.myapp.com/some/path
How can I make that (maybe by using just Ruby on Rails)?
Note: I am validating URIs in order to allow users to set custom redirects and I am aware of vulnerabilities. I plan to run validations in a method stated in my application_controller.rb.
require 'uri'
uri = URI.parse 'http://www.website.com/some/path'
isOK = case uri.host
when 'www.website.com' then true
else false
end
#⇒ true
Hope it helps.
Currently Addressable is my preferred choice for working with URIs:
require 'addressable/uri'
Addressable::URI.parse('http://www.myapp.com/some/path').host == 'www.myapp.com'

Remove URL prefix in Ruby

I have an S3 URL like so:
https://bucket.s3.amazonaws.com/uploads/1c4248b2-4256-4af4-ac1b-0e1e3f7ec2c8/filename.jpg
What I'd like to do, is, using Ruby, remove the prefix https://bucket.s3.amazonaws.com/ leaving only uploads/1c4248b2-4256-4af4-ac1b-0e1e3f7ec2c8/filename.jpg.
I'm unsure whether using gsub and just replacing the prefix (hardcoded) with empty space is the right way to go – or, if there's a more efficient approach.
url.gsub('https://bucket.s3.amazonaws.com/', '')
You can use URI from ruby's standard library:
irb> require 'uri'
=> true
irb> u = URI("https://bucket.s3.amazonaws.com/uploads/1c4248b2-4256-4af4-ac1b-0e1e3f7ec2c8/filename.jpg")
=> #<URI::HTTPS:0x000000020995f0 URL:https://bucket.s3.amazonaws.com/uploads/1c4248b2-4256-4af4-ac1b-0e1e3f7ec2c8/filename.jpg>
irb> u.path
=> "/uploads/1c4248b2-4256-4af4-ac1b-0e1e3f7ec2c8/filename.jpg"
Alternatively u.request_uri returns any parameters on the URI too.

Get URL without filename?

I'm trying to figure out how to parse a URL in Rails, and return everything except the filename, or, everything except that which follows the last backslash.
For example, I'd like:
http://bucket.s3.amazonaws.com/directoryname/1234/thumbnail.jpg
to become:
http://bucket.s3.amazonaws.com/directoryname/1234/
I've found every way to parse a URI, but this. Any help would be appreciated.
Ruby has methods available to get you there easily:
File.dirname(URL) # => "http://bucket.s3.amazonaws.com/directoryname/1234"
Think about what a URL/URI is: It's a designator for a protocol, a site, and a directory-path to a resource. The directory-path to a resource is the same as a "path/to/file", so File.dirname works nicely, without having to reinvent that particular wheel.
The trailing / isn't included because it's a delimiter between the path segments. You generally don't need that, because joining a resource to a path will automatically supply it.
Really though, using Ruby's URI class is the proper way to mangle URIs:
require 'uri'
URL = 'http://bucket.s3.amazonaws.com/directoryname/1234/thumbnail.jpg'
uri = URI.parse(URL)
uri.merge('foo.html').to_s
# => "http://bucket.s3.amazonaws.com/directoryname/1234/foo.html"
URI allows you to mess with the path easily too:
uri.merge('../foo.html').to_s
# => "http://bucket.s3.amazonaws.com/directoryname/foo.html"
uri.merge('../bar/foo.html').to_s
# => "http://bucket.s3.amazonaws.com/directoryname/bar/foo.html"
URI is well-tested, and designed for this purpose. It will also allow you to add query parameters easily, encoding them as necessary.
File name
"http://bucket.s3.amazonaws.com/directoryname/1234/thumbnail.jpg".match(/(.*\/)+(.*$)/)[2]
=> "thumbnail.jpg"
URL without the file name
"http://bucket.s3.amazonaws.com/directoryname/1234/thumbnail.jpg".match(/(.*\/)+(.*$)/)[1]
=> "http://bucket.s3.amazonaws.com/directoryname/1234/"
String#match
'http://a.b.pl/a/b/c/d.jpg'.rpartition('/').first
=> "http://a.b.pl/a/b/c"

Ruby on Rails Directory Path

I need to print Ruby on Rails complete url in my application. in details
with RAILS_ROOT I m getting a url like this
D:/projects/rails_app/projectname/data/default.jpg
But for my application I need a path like this
http://localhost:3000/data/default.jpg
Please help me to solve this issue. I am using Rails 2
Thanks
Today we use URI. Simply require the library and you will be able to parse your current dynamic and static URI any way you please. For example I have a function that can read URI parameters like so...
#{RAILS_ROOT}/app/helpers/application_helper.rb (The literal path string of the file depicted below)
def read_uri(parameter)
require 'uri'
#raw_uri = URI.parse(request.original_fullpath)
#uri_params_raw = #raw_uri.query
if #uri_params_raw =~ /\=/
#uri_vars = #uri_params_raw.split('=')
return #uri_vars[parameter]
end
return false
end
This should split all URI parameters into an array that gives the requested (numeric) "parameter".
I believe that simply the URI.parse(request.original_fullpath) should work for you.
I come from using a minimum of rails 4.2.6 with this method so, I hope it works for anyone who might view this later on. Oh, and just as a disclaimer: I wasn't so wise to rails at the time of posting this.

Resources