When someone by mistake enters many spaces between characters what I do is to replace all spaces with - but what if there are many spaces in between? for e.g:
User entered post title:
فارسی * Allposts---
When I convert the above example to user-friendly url (slug) I get this:
----فارسی---*-Allposts---
How to put one - for spaces and remove special characters and preserve utf-8 characters as well? The output I'm seeking for is as below:
فارسی-Allposts
Is there a way to handle it with regex? if positive, how?
EDIT:
Now I can manage multiple spaces as below:
$string = preg_replace('/\s+/', '-', $string);
but for special chars problem still remains.
Remove special characters: replace [\-\?\*] or whatever your blacklist characters are with empty string.
Convert strings of whitespace to a single - character: replace \s+ with -
Looks like you already figured out step 2. Make sure you do it second so you don't accidentally remove your own hyphens that you just inserted.
Related
I'm trying to find a way to normalize a string to pass it as a filename.
I have this so far:
my_string.mb_chars.normalize(:kd).gsub(/[^\x00-\x7F]/n, '').downcase.gsub(/[^a-z]/, '_')
But first problem: the - character. I guess there is more problems with this method.
I don't control the name, the name string can have accents, white spaces and special chars. I want to remove all of them, replace the accents with the corresponding letter ('é' => 'e') and replace the rest with the '_' character.
The names are like:
"Prélèvements - Routine"
"Carnet de santé"
...
I want them to be like a filename with no space/special chars:
"prelevements_routine"
"carnet_de_sante"
...
Thanks for the help :)
Take a look at ActiveSupport::Inflector.transliterate, it's very useful handling this kind of chars problems. Read there: ActiveSupport::Inflector
Then, you could do something like:
ActiveSupport::Inflector.transliterate my_string.downcase.gsub(/\s/,"_")
Use ActiveStorage::Filename#sanitized, if spaces are okay.
If spaces are okay, which I would suggest keeping, if this is a User-provided and/or User-downloadable file, then you can make use of the ActiveStorage::Filename#sanitized method that is meant for exactly this situation.
It removes special characters that are not allowed in a file name, whilst keeping all of the nice characters that Users typically use to nicely organize and describe their files, like spaces and ampersands (&).
ActiveStorage::Filename.new( "Prélèvements - Routine" ).sanitized
#=> "Prélèvements - Routine"
ActiveStorage::Filename.new( "Carnet de santé" ).sanitized
#=> "Carnet de santé"
ActiveStorage::Filename.new( "Foo:Bar / Baz.jpg" ).sanitized
#=> "Foo-Bar - Baz.jpg"
Use String#parameterize, if you want to remove nearly everything.
And if you're really looking to remove everything, try String#parameterize:
"Prélèvements - Routine".parameterize
#=> "prelevements-routine"
"Carnet de santé".parameterize
#=> "carnet-de-sante"
"Foo:Bar / Baz.jpg".parameterize
#=> "foo-bar-baz-jpg"
I'm using the following code snippet:
Entity${0/(\w+)/\u\1/g}
This ensures the first character is uppercase and the rest is lowercase. How would I also ensure that hypens (-) and special characters are removed?
Thanks in advance.
Figured it out by doing the following:
Entity${0/(\w+)([-\s]*)/\u\1/g}
At the moment, this only removes hypens (-). I'd like to remove all characters except alphanumeric characters.
If there's a cleaner way, I'd be more than welcome to accept your answer instead.
I'm pretty new on grails, I'm having a problem in matches validation using regex. What I wanted to happen is my field can accept a combination of alphanumeric and specific special characters like period (.), comma (,) and dash (-), it may accept numbers (099) or letters only (alpha) , but it won't accept input that only has special characters (".-,"). Is it possible to filter this kind of input using regex?
please help. Thank you for sharing your knowledge.
^[0-9a-zA-Z,.-]*?[0-9a-zA-Z]+?[0-9a-zA-Z,.-]*$
meaning:
/
^ beginning of the string
[...]*? 0 or more characters from this class (lazy matching)
[...]+? 1 or more characters from this class (lazy matching)
[...]* 0 or more characters from this class
$ end of the string
/
I think you could match that with a regular expression like this:
".*[0-9a-zA-Z.,-]+.*"
That means:
"." Begin with any character
"*" Have zero or more of these characters
"[0-9a-zA-Z.,-]" Have characters in the range 0-9, a-z, etc, or . or , or -
"+" Have one or more of this kind of character (so it's mandatory to have one in this set)
"." End with any character
"*" Have zero or more of these characters
This is working ok for me, hope it helps!
I have seen the following on StackOverflow about URL characters:
There are two sets of characters you need to watch out for - Reserved and Unsafe.
The reserved characters are:
ampersand ("&")
dollar ("$")
plus sign ("+")
comma (",")
forward slash ("/")
colon (":")
semi-colon (";")
equals ("=")
question mark ("?")
'At' symbol ("#").
The characters generally considered unsafe are:
space,
question mark ("?")
less than and greater than ("<>")
open and close brackets ("[]")
open and close braces ("{}")
pipe ("|")
backslash ("\")
caret ("^")
tilde ("~")
percent ("%")
pound ("#").
I'm trying to code a URL so I can parse it using delimiters. They can't be numbers or letters though. Does anyone have a list of characters that are NOT Reserved but ARE safe to use?
Thanks for any help you can provide.
Don't bother trying to use safe/unreserved characters. Just use whatever delimiters you want and URLencode the whole thing. Then URL decode it on the other end and parse normally.
Is there a reason you can't just use the standard delimiter for URL parameters (&)? That is the most straightforward way to do it instead of trying to roll your own.
For example the standard URL syntax already allows for multi-valued paramaters natively. This is perfectly legal and doesn't require any trickery.
Somepage.aspx?parameterName=A¶meterName=B
The result is that the page would be passed "A,B" in the parameterName attribute.
I am trying to find a regex to limit what a person can use for a username on my site. I don't need to have it check to see how many characters there are in it, as another validation does this. Basically all I need to make it do is make sure that it allows: letters (capital and lowercase) numbers, dashes and underscores.
I came across this: /^[-a-z]+$/i
But it doesn't seem to allow numbers.
What am I missing?
The regex you're looking for is
/\A[a-z0-9\-_]+\z/i
Meaning one or more characters of range a-z, range 0-9, - (needs to be escaped with a backslash) and _, case insensitive (the i qualifier)
Use
/\A[\w-]+\z$/
\w is shorthand for letters, digits and underscore.
\A matches at the start of the string, \z matches at the end of the string. These tokens are called anchors, and Ruby is a bit special with regard to them: Most regex engines use ^ and $ as start/end-of-string anchors by default, whereas in Ruby they can also match at the start/end of lines (which matters if you're working with multiline strings). Therefore, it's safer (as #JustMichael pointed out) to use \A and \z because there is no such ambiguity.
Your regular expression contains a character class [-a-z] that allows the characters - (dash) and a through z. In order to expand the range of characters allowed by this character class, you will need to add more characters within the [].
Please see Character Classes or Character Sets for further information and examples.