Rails: validation for top level domain only - ruby-on-rails

I need some regex or maybe native Rails trick to check if user entered only domain (without "http", "https", "www" and so on.
So, this one would be valid:
google.com.ua
And this would be invalid:
https://www.google.com.ua
Maybe, it can be simplified just to check if string contains only dots and take it like valid one, and if it contains any other characters - block it.
Tell me please what is better to use for such case and what would be regex for it or another decision.
Thanks.

^(?!www\.)[a-zA-Z0-9.-]+$
Try this.See demo.
https://regex101.com/r/wZ0iA3/5

Related

Regex URL validation in Ruby

I have the following ruby code in a model:
if privacy_policy_link.match(/#{domain}($|\/$)/)
errors.add(:privacy_policy_link, 'Link to a specific privacy policy page on your site instead of your homepage.')
end
This worked until a user tried to save a privacy policy link that looked like this:
https://example.com/about-me/privacy-policy-for-example-com/
The goal is that I don't want them linking to their base homepage (example.com or www.example.com etc) for this privacy policy link (for some random, complicated reasons I won't go into). The link provided above should pass the matcher (meaning that because they are linking to a separate page on their site, it shouldn't be considered a match and they shouldn't see any errors when saving the form) - but because they reference their base domain in second half of the url, it comes up as a match.
I cannot, for the life of me, figure out how the correct regex on rubular to get this url to pass the matching algorithm. And of course I cannot just ask the user to rename their privacy policy link to remove the "com" from it at the end - because this: https://example.com/about-me/privacy-policy-for-example would pass. :)
I would be incredibly grateful for any assistance that could help me understand how to solve this problem!
Rubular link: http://rubular.com/r/G5OmYfzi6t
Your issue is the . character is any character so it matched the - in example-com.
If you chain it to the beginning of the line it will match correctly without trying to escape the . in the domain.
if privacy_policy_link.match(%r{^(http[s]?://|)(www.)?#{domain}($|/$)})

How can I show the name of the link without http://, https://, and everything that goes after .com and other similar domains?

In my view I'm displaying the link in a such way:
<%= #casino.play_now_link %>
So, #casino.play_now_link can be like this: https://www.spinstation.com/?page=blockedcountry&content=1 What I need, is to display only this part: www.spinstation.com. I tried gsub('http://', '').gsub('https://', ''), and it works, but how can I remove the part of url name after .com? Thanks in advance.
Don't use regexes at all for this sort of thing, use URI from the standard library:
URI.parse(#casino.play_now_link).hostname
or, for a more robust solution, use Addressable:
Addressable::URI.parse(#casino.play_now_link).hostname
Of course, this assumes that you've properly validated that your play_now_links are valid URIs. If you haven't then you can add validations that use URI or Addressable to do so and either clean up existing play_now_links that aren't valid URIs or wrap the parsing and hostname extraction in a method (which is a good idea anyway) with some error handling.
In a simple way one can use
.split('/')[2]
which is regex based and depends on the '/' in your url.
But as #mu is too short mentioned: URI is better for this.

Can friendly-id gem work with capital letters in url e.g. /users/joe-blogs and /users/Joe-Blogs both work

I like the friendly id gem but one problem i'm seeing is when I type in a url with a capitol letter in it such as /users/Joe-Blogs it cant find the page. Its a little trivial but most sites can handle something like this and will generate the page whether it has a capitol letter or not. Does anyone know a fix for this?
Edit: to clarify this is for when users enter a url manually and put capitals in it just because its a name like author/Joe-Blogs. I've seen other sites handle this but rails seems to just give a 404.
friendly_id uses parameterize to create the slugs.
I think the best way to solve your problem is to parameterize the params before using it to find.
# controller
User.find(params[:id].parameterize)
Or parameterize the url where the link originated from.
As an addition to Vic's answer, you'll want to look at url normalization:
The following normalizations are described in RFC 3986 to result in equivalent URLs:
Converting the scheme and host to lower case.
The scheme and host components of the URL are case-insensitive. Most normalizers will convert them to lowercase.
Example: HTTP://www.Example.com/ → http://www.example.com/
In short - it's against convention to use capitalization in your urls.
You may also wish to look at URI normalize; more importantly, you should work to remove the capitalization from your URLs:
URI.parse(params[:id]).normalize

Best format for adding a version id into a URL path

I'm currently re-working an application and want to add in a version number to the application URL paths. For example:
http://mydomain/app/VERSION-ID/resource/...
My question is, what is the correct or standard format to add a version id to a URL string? Is there any disadvantage to just having it numeric (1.1 or 1-1):
Example: https://api.twitter.com/1.1/account/verify_credentials.json
Or is it better to have a non numeric identifier to be more intuitive as the url is public facing?
Thanks.
Do not use dots in a URL unless you're defining domain spaces. Use either dashes or other truncated versions (that don't use disallowed characters in the URL).
EXAMPLE:
Example: https://api.twitter.com/v1-1/account/verify_credentials.json
UPDATE: Here is some more information in another thread. My preference is not to use dots if at all possible, but it is apparently OK to do.
Can urls contain dots in the path part?

How to check if URLs match, within a huge database of online products?

So, the problem seems simple at the beginning but is not. Using Mongo and Node.js.
Problem: I have a URL. I need to match that URL with all the URLs I have in my database. Remember, there is no rule that the URL I'm on always have "category" infront or things like that. And please don't take "cases" into consideration.
I have no clue of the name of parameters, or anything else.
Let's assume the URL is smth like example.com/category/product_name.html?session_id=2423412fd
In the database I only have example.com/product_name.html
The URL is smth like example.com/index.php?productid=6&category=3&utm_campaign=google&utm_source=click
In the database I only have example.com/index.php?productid=6
The URL is smth like example.com/product_name.html
In the database I only have example.com/category/subcategory/product.html
I think I made my point. What I'm looking is a solution that matches URL in any cases (they are more than these). It can be an external services, class or something complex.
But I need it to work, and to work very fast because is doing this on every page refresh.
Thank you!
I would use this function to separate the strings http://php.net/manual/en/function.parse-url.php
Then take parts of the path name which you want to match from the URL and query your database URL's looking for matches.
To follow on from Anagio's answer, the URL
example.com/index.php?productid=6&category=3&utm_campaign=google&utm_source=click
could be saved as a Mongo object like:
{
url: "example.com/index.php?productid=6&category=3&utm_campaign=google&utm_source=click",
indexes: [
"example.com",
"index.php",
"productid=6",
"category=3",
"utm_campaign=google",
"utm_source=click"
]
}
You could then split up any new URL using the same algorithm, then do a map/reduce on the indexes field for scoring and then take the highest score as the best "fuzzy match"

Resources