How do you make a better seo friendly URL when there is UTF-8 characters and many space in between? - url

When someone by mistake enters many spaces between characters what I do is to replace all spaces with - but what if there are many spaces in between? for e.g:
User entered post title:
فارسی * Allposts---
When I convert the above example to user-friendly url (slug) I get this:
----فارسی---*-Allposts---
How to put one - for spaces and remove special characters and preserve utf-8 characters as well? The output I'm seeking for is as below:
فارسی-Allposts
Is there a way to handle it with regex? if positive, how?
EDIT:
Now I can manage multiple spaces as below:
$string = preg_replace('/\s+/', '-', $string);
but for special chars problem still remains.

Remove special characters: replace [\-\?\*] or whatever your blacklist characters are with empty string.
Convert strings of whitespace to a single - character: replace \s+ with -
Looks like you already figured out step 2. Make sure you do it second so you don't accidentally remove your own hyphens that you just inserted.

Related

Standardize a String for Filename, remove accents and special chars

I'm trying to find a way to normalize a string to pass it as a filename.
I have this so far:
my_string.mb_chars.normalize(:kd).gsub(/[^\x00-\x7F]/n, '').downcase.gsub(/[^a-z]/, '_')
But first problem: the - character. I guess there is more problems with this method.
I don't control the name, the name string can have accents, white spaces and special chars. I want to remove all of them, replace the accents with the corresponding letter ('é' => 'e') and replace the rest with the '_' character.
The names are like:
"Prélèvements - Routine"
"Carnet de santé"
...
I want them to be like a filename with no space/special chars:
"prelevements_routine"
"carnet_de_sante"
...
Thanks for the help :)
Take a look at ActiveSupport::Inflector.transliterate, it's very useful handling this kind of chars problems. Read there: ActiveSupport::Inflector
Then, you could do something like:
ActiveSupport::Inflector.transliterate my_string.downcase.gsub(/\s/,"_")
Use ActiveStorage::Filename#sanitized, if spaces are okay.
If spaces are okay, which I would suggest keeping, if this is a User-provided and/or User-downloadable file, then you can make use of the ActiveStorage::Filename#sanitized method that is meant for exactly this situation.
It removes special characters that are not allowed in a file name, whilst keeping all of the nice characters that Users typically use to nicely organize and describe their files, like spaces and ampersands (&).
ActiveStorage::Filename.new( "Prélèvements - Routine" ).sanitized
#=> "Prélèvements - Routine"
ActiveStorage::Filename.new( "Carnet de santé" ).sanitized
#=> "Carnet de santé"
ActiveStorage::Filename.new( "Foo:Bar / Baz.jpg" ).sanitized
#=> "Foo-Bar - Baz.jpg"
Use String#parameterize, if you want to remove nearly everything.
And if you're really looking to remove everything, try String#parameterize:
"Prélèvements - Routine".parameterize
#=> "prelevements-routine"
"Carnet de santé".parameterize
#=> "carnet-de-sante"
"Foo:Bar / Baz.jpg".parameterize
#=> "foo-bar-baz-jpg"

Sublime Text 2 - Perl Format String Syntax (Code Snippets)

I'm using the following code snippet:
Entity${0/(\w+)/\u\1/g}
This ensures the first character is uppercase and the rest is lowercase. How would I also ensure that hypens (-) and special characters are removed?
Thanks in advance.
Figured it out by doing the following:
Entity${0/(\w+)([-\s]*)/\u\1/g}
At the moment, this only removes hypens (-). I'd like to remove all characters except alphanumeric characters.
If there's a cleaner way, I'd be more than welcome to accept your answer instead.

regex validation - grails constraints

I'm pretty new on grails, I'm having a problem in matches validation using regex. What I wanted to happen is my field can accept a combination of alphanumeric and specific special characters like period (.), comma (,) and dash (-), it may accept numbers (099) or letters only (alpha) , but it won't accept input that only has special characters (".-,"). Is it possible to filter this kind of input using regex?
please help. Thank you for sharing your knowledge.
^[0-9a-zA-Z,.-]*?[0-9a-zA-Z]+?[0-9a-zA-Z,.-]*$
meaning:
/
^ beginning of the string
[...]*? 0 or more characters from this class (lazy matching)
[...]+? 1 or more characters from this class (lazy matching)
[...]* 0 or more characters from this class
$ end of the string
/
I think you could match that with a regular expression like this:
".*[0-9a-zA-Z.,-]+.*"
That means:
"." Begin with any character
"*" Have zero or more of these characters
"[0-9a-zA-Z.,-]" Have characters in the range 0-9, a-z, etc, or . or , or -
"+" Have one or more of this kind of character (so it's mandatory to have one in this set)
"." End with any character
"*" Have zero or more of these characters
This is working ok for me, hope it helps!

Non-reserved yet safe characters for delimiters in a URL

I have seen the following on StackOverflow about URL characters:
There are two sets of characters you need to watch out for - Reserved and Unsafe.
The reserved characters are:
ampersand ("&")
dollar ("$")
plus sign ("+")
comma (",")
forward slash ("/")
colon (":")
semi-colon (";")
equals ("=")
question mark ("?")
'At' symbol ("#").
The characters generally considered unsafe are:
space,
question mark ("?")
less than and greater than ("<>")
open and close brackets ("[]")
open and close braces ("{}")
pipe ("|")
backslash ("\")
caret ("^")
tilde ("~")
percent ("%")
pound ("#").
I'm trying to code a URL so I can parse it using delimiters. They can't be numbers or letters though. Does anyone have a list of characters that are NOT Reserved but ARE safe to use?
Thanks for any help you can provide.
Don't bother trying to use safe/unreserved characters. Just use whatever delimiters you want and URLencode the whole thing. Then URL decode it on the other end and parse normally.
Is there a reason you can't just use the standard delimiter for URL parameters (&)? That is the most straightforward way to do it instead of trying to roll your own.
For example the standard URL syntax already allows for multi-valued paramaters natively. This is perfectly legal and doesn't require any trickery.
Somepage.aspx?parameterName=A&parameterName=B
The result is that the page would be passed "A,B" in the parameterName attribute.

username regex in rails

I am trying to find a regex to limit what a person can use for a username on my site. I don't need to have it check to see how many characters there are in it, as another validation does this. Basically all I need to make it do is make sure that it allows: letters (capital and lowercase) numbers, dashes and underscores.
I came across this: /^[-a-z]+$/i
But it doesn't seem to allow numbers.
What am I missing?
The regex you're looking for is
/\A[a-z0-9\-_]+\z/i
Meaning one or more characters of range a-z, range 0-9, - (needs to be escaped with a backslash) and _, case insensitive (the i qualifier)
Use
/\A[\w-]+\z$/
\w is shorthand for letters, digits and underscore.
\A matches at the start of the string, \z matches at the end of the string. These tokens are called anchors, and Ruby is a bit special with regard to them: Most regex engines use ^ and $ as start/end-of-string anchors by default, whereas in Ruby they can also match at the start/end of lines (which matters if you're working with multiline strings). Therefore, it's safer (as #JustMichael pointed out) to use \A and \z because there is no such ambiguity.
Your regular expression contains a character class [-a-z] that allows the characters - (dash) and a through z. In order to expand the range of characters allowed by this character class, you will need to add more characters within the [].
Please see Character Classes or Character Sets for further information and examples.

Resources