I want to search a string for a substring, but allow for differing but similar punctuation characters (including spaces).
For example, if I have the string
#"this is a 'difficult' string to search"
and I search for the substring
#"a ‘difficult‘ string"
it will not currently be found, because the substring uses different types of single quotes.
Is there a way of searching that allows for slight variations such as this? Or do I have to write my own way? And if I have to write my own way, how do I go about it?
Obviously there are many other possibilities that I want to cover, there are a number of types of:
double quotes (e.g. U+0022, U+201C, U+201D)
single quotes (U+0027, U+2018, U+2019)
dashes (U+002D, U+2010, U+2011, U+2012, U+2013, U+2014, U+2015 (etc))
spaces (U+0020, U+00A0 (etc))
etc
So how can I do something like
[myString rangeOfString:subString options:allowForSimilarPunctuation]
So far I have been altering the string and substring by replacing combinations of characters and doing repetitious searches, but this seems a clumsy way, there must be a clever way of doing this?
You can use character classes:
#"a [‘']difficult[‘'] string"
Here's an example on Regex101 if you want to give it a whirl:
https://regex101.com/r/iZ6lQ8/1
Related
I am using:
c.customerName =~ '(?i).*$q.*'
in order to find insensitive case any kind of customername and this is working absolutely fine for all standard character. In German unfortunately there are special chars e.g. like Ä,Ö,Ü. In this cases the cypher statement is case sensitive, e.g. if we have two customer names like Ötest and ötest it will find only one of them depending if you type a lower or an upper Ö.
Anyone has a hint what I can do to expand the insensitive case search also on such special chars?
EDIT: The problem exists also when you have a name including e.g. a '&' - you'll find e.g. the company D&A Construction when you type 'D&' - the moment you add a thrid character 'D&A' the search fails and no result is shown. Any idea?
You need to add a 'u' in your regex to transform it in a case-insensitive unicode regex. Like this:
c.customerName =~ '(?ui).*$q.*'
Works here:
From this StackOverflow question.
I am having an issue with a URL query string and I believe the issue is that my parameter sometimes has a comma in it.
What happens is I have a query string that is generated from a list of group names so that my string looks something like:
Group=GroupName1,GroupName2,GroupName3
While doing some testing I noticed that some of my groups are not being displayed on the page even though they are in the query string. Then I noticed that the groups that are not showing are those that have a comma in the name. For example:
Group=People,%20Places%20and%20Stuff
Obviously the query string gets parsed looking for 'People' as a group and 'Places and Stuff' as a group. This is an issue because the group is 'People, Places and Stuff'. I don't have any control over the group names so they cannot be changed to not include commas. I tried to encode the comma in the string using %2C however that had no impact.
I did some searching but I couldn't find anything other than a suggestion about changing the server so that the delimiter isn't a comma but I don't have the ability to that. Any other solution or am I stuck?
After doing a bunch of hunting I finally found the answer.
I was on the right track encoding the comma as %2C however this has to be preceded by an escape character of %5C. Therefore the url query string would be the following:
Group=People%5C%2C%20Places%20and%20Stuff
Is there an option in neo4j to write a select query with where clause, that ignores non-latin characters ?
MATCH (places:Place)
WHERE (places.name =~ '.*(?ui)Fabergé.*')
RETURN places
I have place with Fabergé name in graph and i want to find it when user type Fabergé or Faberge without this special character.
I'm not aware of an easy way to do this directly with a regex match in Cypher.
One possible workaround is to store the string in question in a normalized form in a second property e.g. place.name_normalized and then compare it with the normalized search string. Of course normalization needs to be done on client side, see another SO question on how to achive this: Remove diacritical marks (ń ǹ ň ñ ṅ ņ ṇ ṋ ṉ ̈ ɲ ƞ ᶇ ɳ ȵ) from Unicode chars
In my program, I am trying match a string that has two letters and then a few words between them like such: "! hello my name !" In this example, the string "hello my name" can change in the number of words to a string such as: "hello" or even more words. Anyways, how can I match the string between the exclamation marks? The main problem is that I cannot figure out the expression to use in the string match to represent a string with multiple words of an unknown length.
Use the pattern !([^!]+)!, in which [^!]* matches zero or more characters that aren't !.
print(string.match("! hello my name !","!([^!]*)!"))
Try also the pattern "!(.-)!".
This matches the shortest string of this form, unlike "!(.*)!", which matches the longest one.
I want to use validates_format_of to validate a comma separated string with only letters (small and caps), and numbers.
So.
example1, example2, 22example44, ex24
not:
^&*, <> , asfasfsdafas<#%$#
Basically I want to have users enter comma separated words(incl numbers) without special characters.
I'll use it to validate tags from acts_as_taggable_on. (i don't want to be a valid tag for example.
Thanks in advance.
You can always test out regular expressions at rubular, you would find that both tiftiks and Tims regular expressions work albeit with some strange edge cases with whitespace.
Tim's solution can be extended to include leading and trailing whitespace and that should then do what you want as follows :-
^\s*[A-Za-z0-9]+(\s*,\s*[A-Za-z0-9]+)*\s*$
Presumably when you have validated the input string you will want to turn it into an array of tags to iterate over. You can do this as follows :-
array_var = string_var.delete(' ').split(',')
^([a-zA-Z0-9]+,\s*)*[a-zA-Z0-9]+$
Note that this regex doesn't match values with whitespace, so it won't match multiple words like "abc xyz, fgh qwe". It matches any amount of whitespace after commas. You might not need ^ or $ if validates_format_of tries to match the whole string, I've never used Rails so I don't know about that.
^[A-Za-z0-9]+([ \t]*,[ \t]*[A-Za-z0-9]+)*$
should match a CSV line that only contains those characters, whether it's just one value or many.