In my program, I am trying match a string that has two letters and then a few words between them like such: "! hello my name !" In this example, the string "hello my name" can change in the number of words to a string such as: "hello" or even more words. Anyways, how can I match the string between the exclamation marks? The main problem is that I cannot figure out the expression to use in the string match to represent a string with multiple words of an unknown length.
Use the pattern !([^!]+)!, in which [^!]* matches zero or more characters that aren't !.
print(string.match("! hello my name !","!([^!]*)!"))
Try also the pattern "!(.-)!".
This matches the shortest string of this form, unlike "!(.*)!", which matches the longest one.
Related
I want to achieve this: retrieve a word from a CSV file, then look for the existence of a hashtag with the word in a post the problem is that I was unable to perform the concatenation
The "Type mismatch" error could be solved by enclosing the concatenation in parentheses, as in:
WHERE line[0] =~ (".*#" + line[0] + ".*")
However, logically, that WHERE clause can never be true. A string cannot be equal to a longer string (itself, preceded by an extra character).
If you just trying to see if a word starts with a hashtag, this should work:
WHERE line[0] STARTS WITH "#"
Or, if you want to see if there is a hashtag in the string:
WHERE line[0] CONTAINS "#"
I want to search a string for a substring, but allow for differing but similar punctuation characters (including spaces).
For example, if I have the string
#"this is a 'difficult' string to search"
and I search for the substring
#"a ‘difficult‘ string"
it will not currently be found, because the substring uses different types of single quotes.
Is there a way of searching that allows for slight variations such as this? Or do I have to write my own way? And if I have to write my own way, how do I go about it?
Obviously there are many other possibilities that I want to cover, there are a number of types of:
double quotes (e.g. U+0022, U+201C, U+201D)
single quotes (U+0027, U+2018, U+2019)
dashes (U+002D, U+2010, U+2011, U+2012, U+2013, U+2014, U+2015 (etc))
spaces (U+0020, U+00A0 (etc))
etc
So how can I do something like
[myString rangeOfString:subString options:allowForSimilarPunctuation]
So far I have been altering the string and substring by replacing combinations of characters and doing repetitious searches, but this seems a clumsy way, there must be a clever way of doing this?
You can use character classes:
#"a [‘']difficult[‘'] string"
Here's an example on Regex101 if you want to give it a whirl:
https://regex101.com/r/iZ6lQ8/1
The Mongoid documentation only gives one example of doing a wildcard search:
Person.where(first_name: /^d/i)
This finds all people with the first name that starts with "d".
What do the /^ and /i represent?
How do I find all people with their first name having an "na" in the middle of the string? E.g., this query would find "jonathan" since "na" is a substring of the entire string.
Is there website or guide with this information?
You need this to find people with "na" in the name.
Person.where(first_name: /na/i)
As for your example:
Person.where(first_name: /^d/i)
^ means "beginning of the line". This regex will match all strings where first letter is "d". /i means "do case-insensitive matches". So it'll match both "d" and "D".
Note: only prefix regexes (with ^ in front) are able to use indexes.
Is there website or guide with this information?
Here's my favourite.
This is not a "wildcard" search, this is called a regular expression.
/^d/i
The two slashes are only the regex delimiters, you search for what is in between those two slashes.
The following i is a modifier or option. It changes the matching behaviour of your regex, the i stands for case insensitive, means it matches "d" and "D".
The first character ^ is an anchor, it anchors the search pattern to the start of the string, means match "d" only at the start of the string
A good tutorial about regular expressions is the tutorial on regular-expressions.info
If you want to search for a string anywhere in the string, just remove the anchor that binds the pattern to the start, /na/ will find "na" anywhere in the string.
I have string "(1,2,3,4,5,6),(1,2,3)" I would like to change it to "('1','2','3','4','5','6'),('1','2','3')" - replase all parts that mathces /([^,)("])/ with the '$1', '$2' etc
"(1,2,3,4,5,6),(1,2,3)".gsub(/([^,)("]\w*)/,"'\\1'")
gsub is a "global replace" method in String class. It finds all occurrences of given regular expression and replaces them with the string given as the second parameter (as opposed to sub which replaces first occurrence only). That string can contain references to groups marked with () in the regexp. First group is \1, second is \2, and so on.
Try
mystring.gsub(/([\w.]+)/, '\'\1\'')
This will replace numbers (ints/floats) and words with their "quote-surrounded" selves while leaving punctuation (except the dot) alone.
UPDATED: I think you want to search for this
(([^,)("])+)
And replace it with this
'$1'
the looks for anything 1 or more times and assigns it to the $1 variable slot due to using the parenthesis around the "\d". The replace part will use what it finds as the replacement value.
I want to use validates_format_of to validate a comma separated string with only letters (small and caps), and numbers.
So.
example1, example2, 22example44, ex24
not:
^&*, <> , asfasfsdafas<#%$#
Basically I want to have users enter comma separated words(incl numbers) without special characters.
I'll use it to validate tags from acts_as_taggable_on. (i don't want to be a valid tag for example.
Thanks in advance.
You can always test out regular expressions at rubular, you would find that both tiftiks and Tims regular expressions work albeit with some strange edge cases with whitespace.
Tim's solution can be extended to include leading and trailing whitespace and that should then do what you want as follows :-
^\s*[A-Za-z0-9]+(\s*,\s*[A-Za-z0-9]+)*\s*$
Presumably when you have validated the input string you will want to turn it into an array of tags to iterate over. You can do this as follows :-
array_var = string_var.delete(' ').split(',')
^([a-zA-Z0-9]+,\s*)*[a-zA-Z0-9]+$
Note that this regex doesn't match values with whitespace, so it won't match multiple words like "abc xyz, fgh qwe". It matches any amount of whitespace after commas. You might not need ^ or $ if validates_format_of tries to match the whole string, I've never used Rails so I don't know about that.
^[A-Za-z0-9]+([ \t]*,[ \t]*[A-Za-z0-9]+)*$
should match a CSV line that only contains those characters, whether it's just one value or many.