How to pattern match in Lua - lua

So let's say I have this in Lua:
myvara = "Box red"
myvarb = "Box red 36"
How do I form an expression to see if both variables are the same if the number changes every time? i.e. I just want to check if both variables are red boxes but the number is not important.
I want to use pattern matching but I don't know how to do so efficiently and in an expression. I don't want to use string.find, it has to be pattern matching.
What I need to be able to do is:
if myvara == myvarb (ignoring box number) then...
... with pattern matching (not string.find or anything like that).
Oh, and there might be a different number of words sometimes and the number might be in a different place. That's why I need to use pattern matching.
Thank you.

You can remove all spaces and numbers from both strings before comparing them:
if (myvara:gsub("[%d ]","") == myvarb:gsub("[%d ]","")) then
....

Related

Character Replacements

I have a UniCode string UniStr.
I also have a MAP of { UniCodeChar : otherMappedStrs }
I need the 'otherMappedStrs' version of UniStr.
Eg: UniStr = 'ABC', MAP = { 'A':'233','B':'#$','C':'9ij' }, Result = '233#$9ij'
I have come up with the formula below which works;
=ArrayFormula(JOIN("",VLOOKUP(REGEXEXTRACT(A1,REPT("(.)",LEN(A1))),MapRange,2,FALSE)))
The MAP being a whole character set (40 chars) is quite large.
I need to use this function in multiple spreadsheets. How can I subsume the MAP into the formula for portability ?
Is there a better way to iterate a string other than the REGEXEXTRACT method in formula ? This method has limitation for long strings.
I also tested the below formula. Problem here is it gives 2 results (or the size of the array within SUBSTITUTE replacement). If 3 substitutions made, then it gives three results. Can this be resolved ?
=ArrayFormula(SUBSTITUTE(A1,{"s","i"},{"#","#"}))
EDIT;
#Tom 's first solution appears best for my case (1) REGEX has an upper limit on search criteria which does not hinder in your solution (2) Feels fast (did not do empirical testing) (3) This is a better way to iterate string characters, I believe (you answered my Q2 - thanks)
I digress here. I wish google would introduce Named-Formulas or Formula-Aliases. In this case, hypothetically below. I have sent feed back along those lines many times. Nothing :(
MyFormula($str) == ArrayFormula(join(,vlookup(mid($str,row(indirect("1:"&len($str))),1), { "A","233";"B","#$";"C","9ij" },2,false)))
Not sure how long you want your strings to be, but the more traditional
=ArrayFormula(join(,vlookup(mid(A1,row(indirect("1:"&len(A1))),1), { "A","233";"B","#$";"C","9ij" },2,false)))
seems a bit more robust for long strings.
For a more radical idea, supposing the maximum length of your otherMappedStrings is 3 characters, then you could try:
=ArrayFormula(join(,trim(mid("233 #$9ij",find(mid(A1,row(indirect("1:"&len(A1))),1), "ABC")*3-2,3))))
where I have put a space in before #$ to pad it out to 3 characters.
Incidentally the original VLOOKUP is not case sensitive. If you want this behaviour, use SEARCH instead of FIND.
You seem to have several different Qs, but considering only portability, perhaps something like the following would help:
=join(,switch(arrayformula(regexextract(A1&"",rept("(.)",len(A1)))),"A",233,"B","#$","C","9ij"))
extended with 37 more pairs.

Active record queries condition using an array and the % sign

Rubyonrails guides suggest to avoid to use conditions as pure strings.
I am writing a simple search form for users and I am still undecided about which argument I can use to replace the question mark. Using internet documentation i tried with the following expression:
find_by("name LIKE ?", "%#{search}%")
I found, after many attempts, the following alternative best suited for my needs:
find_by("name = ?", "#{search}")
What I do not understand is the use of the double % in the first expression, one at the beginning of the interpolated string and the second closing it.
As far as I understood, the LIKE in the first expression is used to return a user based on a incomplete query, such as using Exam to find Example User. However, if I remove the double % it behaves like the second expression. So, What is the double % useful for?
You have got it almost. LIKE is used to match any query which is like the supplied one but % is used to target it in a more specif way. Like if you use LIKE exam% then it will match anything starting with exam like exam, example etc. but not preexam. In the same way LIKE %exam will search for strings ending with exam like preexam but it will not match example.
And LIKE %exam% means match a string which has exam anywhere in string, be it at center or start or end like example, preexam, myexamination.
So without these % it just search for particular string.
This is more a question about your database (which is not included in the question) rather than Rails, but I will draw from the Postgres documentation. Your assumption about 'like' automatically matching partial strings is incorrect.
If pattern does not contain percent signs or underscore, then the
pattern only represents the string itself; in that case LIKE acts like
the equals operator. An underscore (_) in pattern stands for (matches)
any single character; a percent sign (%) matches any string of zero or
more characters.
Source: https://www.postgresql.org/docs/8.3/static/functions-matching.html
As you can see if the pattern matching contains no wildcard characters, it is the same as using an equals operator.
Final note: if you do not care about case sensitivity in your pattern matching, use ilike instead of like

Regex that finds a line with exactly 3 words in it

I have a problem that requires me to write a regex that finds a line that containing exactly 3 groups of characters (it could be words or numbers) and that ends with another specific word. The way I had in mind was to find a pattern that ended in a space, and look for it 3 times. assuming this is the correct way to go about it, I do no know how to find a space, but I thought it would look like .*"find a space"{3} endword$. Is this the way it would be done? Even if it is not the way to do it how do you find a space? Any suggestions?
Assuming by three groups of words you would accept any non-space character, you could write:
/^\s*(?:\S+\s+){3}endword$/
The initial caret is to make sure you have exactly 3 non-space groups on the line.
Of course you need to consider whether things like control characters could appear, and adjust accordingly.
Depending on your flavor, something like the below would do it:
\b+.+?\b+.+?\b+.+?\bendword$
This makes use of the word boundary mark (\b) and non-greedy repetitions (+?), so it may be slightly different in your specific implementation, especially if you're using something old like grep.

How to get a % difference of two NSStrings

I'm thinking this may be impossible to do resonably, but I figured I would take a shot at it. So lets say I have two NSStrings. One is #"Singin' In The Rain" and the other is #"Singing In The Rain". These strings are very similar, but have a small difference. I'm trying to find a way where I could write something like the following:
NSString *stringOne = #"Singin' In The Rain";
NSString *stringTwo = #"Singing In The Rain";
float dif = [stringOne differenceFrom:stringTwo];
//dif = .9634 or something like that
One project that I did find similar to this was taken from the previous similar question on Stack Overflow: Check if two NSStrings are similar. However, this simply returns a BOOL which isn't as accurate as I need it to be. I also tried looking into the compare: documentation for NSString but it all looked too basic. Another similar thing I found is at https://gist.github.com/iloveitaly/1515464. However, this gives varying results, even saying two of the same string are different occasionally. Any advice would be much appreciated.
The question is a little vague, but I would assume that the most satisfactory results will come from using NSLinguisticTagger. If you parse each for tags with the NSLinguisticTagSchemeLexicalClass scheme then your string will be broken down into verbs, nouns, adjectives, etc. In your example, even if you weren't spotting that singin' and singing are the same, you'd spot the other three words are the same and that the thing at the end is a noun, so they're both about doing something in the same thing.
It'd probably be wise to use something like a BK-Tree to compare individual words where you suspect there may be a match (a noun obviously doesn't match an adverb but two nouns may match even if spellings differ).
Another off the wall suggestion:
The source, and hence the algorithm, for diff and similar programs is easily available. These compare input on a line-by-line basis and detect insertions, deletions and changes.
When comparing text strings for "closeness" then the insertion, deletion or changing of words seems as good a measure as any.
So:
Break each string into "words" (white space separated should be sufficient).
Compare the two lists using the diff algorithm, treating each "word" as a "line", use a re-sync length of 1 (the number of "lines" that need to be the same to treat the two inputs as back in sync)
Calculate the "closeness" as the number of insertions/deletions/changes compared to the total word count.
For the two example strings this would give 1:4 changes or 75% similar.
If you want greater granularity for each change split the two words into characters and repeat the algorithm giving you a fraction the word is similar by (as opposed to the whole word).
For the two example strings this would give 3 6/7 words out of 4, or 96% similar.
I'd recommend dynamic time warping for such comparisons:
http://en.wikipedia.org/wiki/Dynamic_time_warping
This will however return distance between two strings (so you'll get 0 for identical), but this the best starting point I can think of.

Regex: Matching on an exclusive either/or

I want a regex that will match for strings of RACE or RACE_1, but not RACE_2 and RACE_3. I've been on Rubular for a while now trying to figure it out, but can't seem to get all the conditions I need met. Help is appreciated.
/^RACE(_1)?$/
Rubular example here
RACE(_1)?\b
\b means the end of a word, and that prevents matching RACE in RACE_2.
You can use:
(\bRACE(_[1])?\b)
It requires the one copy of RACE, and then 0 -> N occurrences of the _[1]. In the square brackets you can include any number you want. EXAMPLE:
(\bRACE(_[12345])?\b) will match up to RACE_5. You can then customize it to even skip numbers if you want [1245] for RACE_1, RACE_2, RACE_4, RACE_5 but not RACE_3.
/RACE(?!_)|RACE_1/
Its a bit of a hack but might fit your needs
EDIT:
Here might be a more specific one that works better
/RACE(?!_\d)|RACE_1/
In both cases, you use negative lookahead to enforce that RACE cannot be followed by _ and a number, but then specifically allow it with the or statement following.
Also, if you plan on only searching for instances of said matches that are whole words, prepend/append with \b to designate word boundaries.
/\bRACE(?!_\d)|RACE_1\b/

Resources