I'm trying to figure out how to parse a URL in Rails, and return everything except the filename, or, everything except that which follows the last backslash.
For example, I'd like:
http://bucket.s3.amazonaws.com/directoryname/1234/thumbnail.jpg
to become:
http://bucket.s3.amazonaws.com/directoryname/1234/
I've found every way to parse a URI, but this. Any help would be appreciated.
Ruby has methods available to get you there easily:
File.dirname(URL) # => "http://bucket.s3.amazonaws.com/directoryname/1234"
Think about what a URL/URI is: It's a designator for a protocol, a site, and a directory-path to a resource. The directory-path to a resource is the same as a "path/to/file", so File.dirname works nicely, without having to reinvent that particular wheel.
The trailing / isn't included because it's a delimiter between the path segments. You generally don't need that, because joining a resource to a path will automatically supply it.
Really though, using Ruby's URI class is the proper way to mangle URIs:
require 'uri'
URL = 'http://bucket.s3.amazonaws.com/directoryname/1234/thumbnail.jpg'
uri = URI.parse(URL)
uri.merge('foo.html').to_s
# => "http://bucket.s3.amazonaws.com/directoryname/1234/foo.html"
URI allows you to mess with the path easily too:
uri.merge('../foo.html').to_s
# => "http://bucket.s3.amazonaws.com/directoryname/foo.html"
uri.merge('../bar/foo.html').to_s
# => "http://bucket.s3.amazonaws.com/directoryname/bar/foo.html"
URI is well-tested, and designed for this purpose. It will also allow you to add query parameters easily, encoding them as necessary.
File name
"http://bucket.s3.amazonaws.com/directoryname/1234/thumbnail.jpg".match(/(.*\/)+(.*$)/)[2]
=> "thumbnail.jpg"
URL without the file name
"http://bucket.s3.amazonaws.com/directoryname/1234/thumbnail.jpg".match(/(.*\/)+(.*$)/)[1]
=> "http://bucket.s3.amazonaws.com/directoryname/1234/"
String#match
'http://a.b.pl/a/b/c/d.jpg'.rpartition('/').first
=> "http://a.b.pl/a/b/c"
Related
I have the following url inside a field of model:
https://www.reddit.com/r/italy/comments/i6ix3x/trenitalia_sostiene_che_potrà_non_rispettare_il/?sort=new
Inside the URL there is an accented letter (à). If I use URI.parse to get hostname gives me the following error:
URI::InvalidURIError: URI must be ascii only "https://www.reddit.com/r/italy/comments/i6ix3x/trenitalia_sostiene_che_potr\u00E0_non_rispettare_il/?sort=new"
The method URL.encode resolves the problem, but I discover that the URL.encode is obsolete and should not be used.
Which method should I use for replacing URI.encode?
this is encoding issue and you need to do it as below
first lets encode your URI first
encoded_url = URI.encode('https://www.reddit.com/r/italy/comments/i6ix3x/trenitalia_sostiene_che_potrà_non_rispettare_il/?sort=new')
And then parse it
URI.parse(encoded_url)
good luck
The only solution that I find uses the gem Addressable(https://github.com/sporkmonger/addressable):
Addressable::URI.parse('https://www.reddit.com/r/italy/comments/i6ix3x/trenitalia_sostiene_che_potrà_non_rispettare_il/?sort=new').host
Perhaps this could be an inelegant solution:
URI.parse(URI.extract(target.url).first)
# => #<URI::HTTPS https://www.reddit.com/r/italy/comments/i6ix3x/trenitalia_sostiene_che_potr>
Then I use the method host
URI.parse(URI.extract(target.url).first).host
# => "www.reddit.com"
I have an S3 URL like so:
https://bucket.s3.amazonaws.com/uploads/1c4248b2-4256-4af4-ac1b-0e1e3f7ec2c8/filename.jpg
What I'd like to do, is, using Ruby, remove the prefix https://bucket.s3.amazonaws.com/ leaving only uploads/1c4248b2-4256-4af4-ac1b-0e1e3f7ec2c8/filename.jpg.
I'm unsure whether using gsub and just replacing the prefix (hardcoded) with empty space is the right way to go – or, if there's a more efficient approach.
url.gsub('https://bucket.s3.amazonaws.com/', '')
You can use URI from ruby's standard library:
irb> require 'uri'
=> true
irb> u = URI("https://bucket.s3.amazonaws.com/uploads/1c4248b2-4256-4af4-ac1b-0e1e3f7ec2c8/filename.jpg")
=> #<URI::HTTPS:0x000000020995f0 URL:https://bucket.s3.amazonaws.com/uploads/1c4248b2-4256-4af4-ac1b-0e1e3f7ec2c8/filename.jpg>
irb> u.path
=> "/uploads/1c4248b2-4256-4af4-ac1b-0e1e3f7ec2c8/filename.jpg"
Alternatively u.request_uri returns any parameters on the URI too.
I am using Ruby on Rails 3.0.10 and I would like to retrieve the scheme://domain part of a generic URL (note: a URL syntax can be also scheme://domain:port/path?query_string#fragment_id).
That is, for example, if I have the following URLs
http://stackoverflow.com/questions/7304043/how-to-retrieve-the-scheme-domainport-part-of-a-generic-url
ftp://some_link.org/some_other_string
# Consider also 'https', 'ftps' and so on... 'scheme:' values.
I would like to retrieve just the
# Note: The last '/' character was removed.
http://stackoverflow.com
ftp://some_link.org
# Consider also 'https', 'ftps' and so on... 'scheme:' values.
parts. How can I do that?
require 'uri'
uri = URI.parse("http://stackoverflow.com/questions/7304043/how-to-retrieve-the-scheme-domainport-part-of-a-generic-url")
url = "#{uri.scheme}://#{uri.host}" #url would be set to http://stackoverflow.com
From Module: URI.
I need to write the full path so need to know what the rails_root domain is. How do I do that? For example:
string = "{RAILS_ROOT}/vendors/#{#vendor.id}"
What is the equivalent of "RAILS_ROOT" to give me what the full domain is for my application? So that in development it would subsstitute localhost:3000 and on my heroku site the right full domain?
You should always avoid, if possible, hard-coding your path, because it is less flexible and more prone to result in broken links in the future. Plus, you can use Rails routing, which is an elegant way to generate everything cohesively in Rails without any need to create the composite parts yourself.
If you have your routes set up properly, you should be able to call:
link_to "View vendor", vendor_url(#vendor.id)
Vendor_url(#vendor.id) in Rails gives you your full URL, which you can then contain in your string variable. Here's how to generate the routes needed for the above:
# in routes.rb
resources :vendors
Try:
File.realpath(RAILS_ROOT)
You could access the request object. request.host_with_port would give you the hostname and port. request.protocol will give you the protocol (http:// or https://). request.fullpath will give you the path with query params.
i'm using the following to verify if a URL is formatted validly:
validates_format_of :website, :with => URI::regexp(%w(http https))
however, it doesn't work when the url doesn't start with http:// or https://. Is there some similar way to validate URLs with URI::regexp (or URI) and make it include valid URLs that don't start with http://? (For example, www.google.com is valid, as is http://www.google.com)
This post on Daring Fireball provides a robust regex:
\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))
A more recent post improves on it (N.B. newlines and indentation added here for clarity; see the post for an even more expanded version with explanations):
(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|
www\d{0,3}[.]|
[a-z0-9.\-]+[.][a-z]{2,4}/)
(?:[^\s()<>]+|
\(([^\s()<>]+|
(\([^\s()<>]+\)))*\))+
(?:\(([^\s()<>]+|
(\([^\s()<>]+\)))*\)|
[^\s`!()\[\]{};:'".,<>?«»“”‘’]))
From my tests URL::regexp is to loose in its definition of a URI (though it does require http…).
You can use a virtual attribute or before_save filter to append a http:// to your URLs if necessary.
This is ruby interpretation (escaped forward slashes)
(?i)\b((?:https?:(?:\/{1,3}|[a-z0-9%])|[a-z0-9.\-]+[.](?:com|net|org|edu|gov|mil|aero|asia|biz|cat|coop|info|int|jobs|mobi|museum|name|post|pro|tel|travel|xxx|ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at|au|aw|ax|az|ba|bb|bd|be|bf|bg|bh|bi|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn|co|cr|cs|cu|cv|cx|cy|cz|dd|de|dj|dk|dm|do|dz|ec|ee|eg|eh|er|es|et|eu|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|je|jm|jo|jp|ke|kg|kh|ki|km|kn|kp|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me|mg|mh|mk|ml|mm|mn|mo|mp|mq|mr|ms|mt|mu|mv|mw|mx|my|mz|na|nc|ne|nf|ng|ni|nl|no|np|nr|nu|nz|om|pa|pe|pf|pg|ph|pk|pl|pm|pn|pr|ps|pt|pw|py|qa|re|ro|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|Ja|sk|sl|sm|sn|so|sr|ss|st|su|sv|sx|sy|sz|tc|td|tf|tg|th|tj|tk|tl|tm|tn|to|tp|tr|tt|tv|tw|tz|ua|ug|uk|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt|yu|za|zm|zw)\/)(?:[^\s()<>{}\[\]]+|\([^\s()]*?\([^\s()]+\)[^\s()]*?\)|\([^\s]+?\))+(?:\([^\s()]*?\([^\s()]+\)[^\s()]*?\)|\([^\s]+?\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’])|(?:(?<!#)[a-z0-9]+(?:[.\-][a-z0-9]+)*[.](?:com|net|org|edu|gov|mil|aero|asia|biz|cat|coop|info|int|jobs|mobi|museum|name|post|pro|tel|travel|xxx|ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at|au|aw|ax|az|ba|bb|bd|be|bf|bg|bh|bi|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn|co|cr|cs|cu|cv|cx|cy|cz|dd|de|dj|dk|dm|do|dz|ec|ee|eg|eh|er|es|et|eu|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|je|jm|jo|jp|ke|kg|kh|ki|km|kn|kp|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me|mg|mh|mk|ml|mm|mn|mo|mp|mq|mr|ms|mt|mu|mv|mw|mx|my|mz|na|nc|ne|nf|ng|ni|nl|no|np|nr|nu|nz|om|pa|pe|pf|pg|ph|pk|pl|pm|pn|pr|ps|pt|pw|py|qa|re|ro|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|Ja|sk|sl|sm|sn|so|sr|ss|st|su|sv|sx|sy|sz|tc|td|tf|tg|th|tj|tk|tl|tm|tn|to|tp|tr|tt|tv|tw|tz|ua|ug|uk|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt|yu|za|zm|zw)\b\/?(?!#)))
Original gist by Gruber
Please copy it to Rubular for testing, i could not make permanent link, as reqexp is probably to long.
Works with http and without, and works with short domains like 'google.com'