Validating URL using URI.parse - ruby-on-rails

I want to validate a URL in Rails using URI.Parse, by saying that the URL is valid as long as URI.parse does not raise URI::InvalidURIError. Is that a good idea? What URLs would pass this validation test?

Yes, it's a good idea if you plan to use those URLs in your app.
URI will prove the string is parseable into these parts: scheme, userinfo, host, port,
registry, path, opaque, query, fragment.
URI handles these schemes:
FTP, HTTP, HTTPS, LDAP, LDAPS, or MailTo
or URI::Generic
http://www.ruby-doc.org/stdlib-1.9.3/libdoc/uri/rdoc/URI/Parser.html#method-i-parse
If you have other schemes, you can handle them yourself after the parse.
URI.parse works by calling URI.split, which uses two regular expressions:
URI::ABS_URI for absolute URIs
URI::REL_URI for relative URIs
You can look at these to see how they match. You can alter them too if you like.

Related

Why is my OAuth not working with www in the url?

Is it intended? Do I need to redirect from www to non-www to get it working? Thanks
OAuth 2.0 specification says:
The redirection URI MUST be an absolute URI which
MAY include a query component which MUST be retained by
the authorization server when adding additional query parameters, and
MUST NOT include a fragment component.
The grammer for absolute URI as defined by RFC3986 is :
absolute-URI = scheme ":" hier-part [ "?" query ]
It means each absolute-URI begins with a scheme name that refers to a specification for assigning identifiers within that scheme.
here is your answer:
Yes it is intended.
Redirection uri should be an Absolute-URI. it must starts with any scheme like http, https, ftp etc and followed by a colon(:).
eg: http://www.google.com is an absolute URI.
hope it would be helpful.

Why does http:// contain two slashes and file:/// three in a browser navigation bar?

Why does http:// contain two slashes—is that just a standard for a URL, or does it have any logical meaning? And why does file:/// contain three slashes, as in file:///C:/a.html?
The authority component of a URI has to be preceded by //:
The authority component is preceded by a double slash ("//") […]
This is also why not all URIs contain the double slash: because not all URIs have an authority component (e.g., URIs using the mailto scheme, the xmpp scheme, etc.).
If you wonder why the double slash instead of something else (or nothing) was chosen for (HTTP) URIs, see Tim Berners-Lee’s FAQ Why the //, #, etc? → What is the history of the //?
tl;dr: He copied the filename syntax which Apollo used.
By the way, he regrets that choice:
I have to say that now I regret that the syntax is so clumsy. I would like http://www.example.com/foo/bar/baz to be just written http:com/example/foo/bar/baz where the client would figure out that www.example.com existed and was the server to contact. But it is too late now.
As mentioned in this superuser post:
The complete syntax is file://host/path.
If the host is localhost, it can be omitted, resulting in
file:///path.
In other words, referring to files in your computer is just like referring to files in localhost.

url encode & url escape & url rewrite, what's the differences?

It's kinda confusing to differenciate those three terms.
It'll be more understandable if you can explain with examples.
Url encoding and Url escaping are one and the same..
URL Encoding is a process of transforming user input to a CGI form so it is fit for travel across the network; basically, stripping spaces and special characters present in the url, replacing them with escape characters.
URL rewriting changes the way you normally associate urls with resources. Normally, test.com/aboutus makes us think that it will take us to the about us page. But internally, Server may take user 1 to /aboutus/page1.html, user 2 to /aboutus/page2.html or any other resource. The Url exposed to the end user will be test.com/aboutus but the resource being rendered can be different. Note that Url Rewriting is performed by Server.

htaccess credentials in URL when password contains a hash #

Using Selenium I am accessing protected pages. I need to put the credentials into the URL to prevent the .htaccess popup from appearing. This is the method suggested in Selenium documentation.
One of the locations I need to access has a hash character in the password, and this causes the browser (both Chrome and Firefox) to not understand the URL and treat it as a search term.
e.g. http://user:pass#example.com/ gets through, but http://user:pa#ss#example.com/ is not recognised as a URL.
How can I "encode" the hash?
You should use Percent-encoding to encode the hash with %23.
See also:
How to escape hash character in URL

Test for URL Format

I'd like to test whether the URL that the user inputs into my form is "proper", e.g. the following are proper:
http://www.google.com
www.google.com
www.google.com/
but the following probably shouldn't be:
google
http://www.go?ogle?#%
I don't have in mind what "proper" means, but is there some standard out there that I can use?
In HTML5 you can use the input element with the type value url: http://www.w3.org/TR/html5/states-of-the-type-attribute.html#url-state-type-url. You'd need to check which browsers already implemented a validation for it, though. If it's important, you'd also need server-side validation, of course.
Here you can see what URLs are considered valid by HTML5: http://www.w3.org/TR/html5/urls.html#valid-url. It references RFC 3986 for URIs and RFC 3987 for IRIs.
You should probably have a look at RegEx for URL validation (see for example this question: PHP validation/regex for URL) or check if your library/programming-language/CMS has special functions for it.

Resources