Using string.find - lua

I'm trying write some code that looks at two data sets and matches them (if match), at the moment I am using string.find and this kinda work but its very rigid. For example: it works on check1 but not on check2/3, as theres a space in the feed or some other word. i like to return a match on all 3 of them but how can i do that? (match by more than 4 characters, maybe?)
check1 = 'jan'
check2 = 'janAnd'
check3 = 'jan kevin'
input = 'jan is friends with kevin'
if string.find(input.. "" , check1 ) then
print("match on jan")
end
if string.find( input.. "" , check2 ) then
print("match on jan and")
end
if string.find( input.. "" , check3 ) then
print("match on jan kevin")
end
PS: i have tried gfind, gmatch, match, but no luck with them

find only does direct match, so if the string you are searching is not a substring you are searching in (with some pattern processing for character sets and special characters), you get no match.
If you are interested in matching those strings you listed in the example, you need to look at fuzzy search. This SO answer may help as well as this one. I've implemented the algorithm listed in the second example, but got better results with two- and tri-gram matching based on this algorithm.

Lua's string.find works not just with exact strings but with patterns as well. But the syntax is a bit different from what you have in your "checks". You'd want check2 to be "jan.+" to match "jan" followed by one or more characters. Your third check will need to be jan.+kevin. Here the dot stands for any character, while the following plus sign indicates that this might be a sequence of one or more characters. There's more info at http://www.lua.org/pil/20.2.html.

Related

How to specify a range in Ruby

I've been looking for a good way to see if a string of items are all numbers, and thought there might be a way of specifying a range from 0 to 9 and seeing if they're included in the string, but all that I've looked up online has really confused me.
def validate_pin(pin)
(pin.length == 4 || pin.length == 6) && pin.count("0-9") == pin.length
end
The code above is someone else's work and I've been trying to identify how it works. It's a pin checker - takes in a set of characters and ensures the string is either 4 or 6 digits and all numbers - but how does the range work?
When I did this problem I tried to use to_a? Integer and a bunch of other things including ranges such as (0..9) and ("0..9) and ("0".."9") to validate a character is an integer. When I saw ("0-9) it confused the heck out of me, and half an hour of googling and youtube has only left me with regex tutorials (which I'm interested in, but currently just trying to get the basics down)
So to sum this up, my goal is to understand a more semantic/concise way to identify if a character is an integer. Whatever is the simplest way. All and any feedback is welcome. I am a new rubyist and trying to get down my fundamentals. Thank You.
Regex really is the right way to do this. It's specifically for testing patterns in strings. This is how you'd test "do all characters in this string fall in the range of characters 0-9?":
pin.match(/\A[0-9]+\z/)
This regex says "Does this string start and end with at least one of the characters 0-9, with nothing else in between?" - the \A and \z are start-of-string and end-of-string matchers, and the [0-9]+ matches any one or more of any character in that range.
You could even do your entire check in one line of regex:
pin.match(/\A([0-9]{4}|[0-9]{6})\z/)
Which says "Does this string consist of the characters 0-9 repeated exactly 4 times, or the characters 0-9, repeated exactly 6 times?"
Ruby's String#count method does something similar to this, though it just counts the number of occurrences of the characters passed, and it uses something similar to regex ranges to allow you to specify character ranges.
The sequence c1-c2 means all characters between c1 and c2.
Thus, it expands the parameter "0-9" into the list of characters "0123456789", and then it tests how many of the characters in the string match that list of characters.
This will work to verify that a certain number of numbers exist in the string, and the length checks let you implicitly test that no other characters exist in the string. However, regexes let you assert that directly, by ensuring that the whole string matches a given pattern, including length constraints.
Count everything non-digit in pin and check if this count is zero:
pin.count("^0-9").zero?
Since you seem to be looking for answers outside regex and since Chris already spelled out how the count method was being implemented in the example above, I'll try to add one more idea for testing whether a string is an Integer or not:
pin.to_i.to_s == pin
What we're doing is converting the string to an integer, converting that result back to a string, and then testing to see if anything changed during the process. If the result is =>true, then you know nothing changed during the conversion to an integer and therefore the string is only an Integer.
EDIT:
The example above only works if the entire string is an Integer and won’t properly deal with leading zeros. If you want to check to make sure each and every character is an Integer then do something like this instead:
pin.prepend(“1”).to_i.to_s(1..-1) == pin
Part of the question seems to be exactly HOW the following portion of code is doing its job:
pin.count("0-9")
This piece of the code is simply returning a count of how many instances of the numbers 0 through 9 exist in the string. That's only one piece of the relevant section of code though. You need to look at the rest of the line to make sense of it:
pin.count("0-9") == pin.length
The first part counts how many instances then the second part compares that to the length of the string. If they are equal (==) then that means every character in the string is an Integer.
Sometimes negation can be used to advantage:
!pin.match?(/\D/) && [4,6].include?(pin.length)
pin.match?(/\D/) returns true if the string contains a character other than a digit (matching /\D/), in which case it it would be negated to false.
One advantage of using negation here is that if the string contains a character other than a digit pin.match?(/\D/) would return true as soon as a non-digit is found, as opposed to methods that examine all the characters in the string.

How can I combine words with numbers when pattern matching in LUA?

I'm trying to match any strings that come in that follow the format Word 100.00% ~(45.56, 34.76) in LUA. As such, I'm looking to do a regex close (in theory) to this:
%D%s[%d%.%d]%%(%d.%d, %d.%d)
But I'm having no luck so far. LUA's patterns are weird.
What am I missing?
Your pattern is close you neglected to allow for multiple instances of a digit you can do this by using a + at like %d+.
You also did not use [,( and . correctly in the pattern.
[s in a pattern will create a set of chars that you are trying to match such as [abc] means you are looking to match any as bs or c at that position.
( are used to define a capture so the specific values you want returned rather then the whole string in the event of a match, in order to use it as a char you for the match you need to escape it with a %.
. will match any character rather then specifically a . you will need to add a % to escape if you want to match a . specifically.
local str = "Word 100.00% ~(45.56, 34.76)"
local pattern = "%w+%s%d+%.%d+%%%s~%(%d+%.%d+, %d+%.%d+%)"
print(string.match(str, pattern))
Here you will see the input string print if it matches the pattern otherwise you will see nil.
Suggested resource: Understanding Lua Patterns

Lua Pattern Matching, get character before match

Currently I have code that looks like this:
somestring = "param=valueZ&456"
local stringToPrint = (somestring):gsub("(param=)[^&]+", "%1hello", 1)
StringToPrint will look like this:
param=hello&456
I have replaced all of the characters before the & with the string "hello". This is where my question becomes a little strange and specific.
I want my string to appear as: param=helloZ&456. In other words, I want to preserve the character right before the & when replacing the string valueZ with hello to make it helloZ instead. How can this be done?
I suggest:
somestring:gsub("param=[^&]*([^&])", "param=hello%1", 1)
See the Lua demo
Here, the pattern matches:
param= - literal substring param=
[^&]* - 0 or more chars other than & as many as possible
([^&]) - Group 1 capturing a symbol other than & (here, backtracking will occur, as the previous pattern grabs all such chars other than & and then the engine will take a step back and place the last char from that chunk into Group 1).
There are probably other ways to do this, but here is one:
somestring = "param=valueZ&456"
local stringToPrint = (somestring):gsub("(param=).-([^&]&)", "%1hello%2", 1)
print(stringToPrint)
The thing here is that I match the shortest string that ends with a character that is not & and a character that is &. Then I add the two ending characters to the replaced part.

Best way to count words in a string in Ruby?

Is there anything better than string.scan(/(\w|-)+/).size (the - is so, e.g., "one-way street" counts as 2 words instead of 3)?
string.split.size
Edited to explain multiple spaces
From the Ruby String Documentation page
split(pattern=$;, [limit]) → anArray
Divides str into substrings based on a delimiter, returning an array
of these substrings.
If pattern is a String, then its contents are used as the delimiter
when splitting str. If pattern is a single space, str is split on
whitespace, with leading whitespace and runs of contiguous whitespace
characters ignored.
If pattern is a Regexp, str is divided where the pattern matches.
Whenever the pattern matches a zero-length string, str is split into
individual characters. If pattern contains groups, the respective
matches will be returned in the array as well.
If pattern is omitted, the value of $; is used. If $; is nil (which is
the default), str is split on whitespace as if ' ' were specified.
If the limit parameter is omitted, trailing null fields are
suppressed. If limit is a positive number, at most that number of
fields will be returned (if limit is 1, the entire string is returned
as the only entry in an array). If negative, there is no limit to the
number of fields returned, and trailing null fields are not
suppressed.
" now's the time".split #=> ["now's", "the", "time"]
While that is the current version of ruby as of this edit, I learned on 1.7 (IIRC), where that also worked. I just tested it on 1.8.3.
I know this is an old question, but this might be useful to someone else looking for something more sophisticated than string.split. I wrote the words_counted gem to solve this particular problem, since defining words is pretty tricky.
The gem lets you define your own custom criteria, or use the out of the box regexp, which is pretty handy for most use cases. You can pre-filter words with a variety of options, including a string, lambda, array, or another regexp.
counter = WordsCounted::Counter.new("Hello, Renée! 123")
counter.word_count #=> 2
counter.words #=> ["Hello", "Renée"]
# filter the word "hello"
counter = WordsCounted::Counter.new("Hello, Renée!", reject: "Hello")
counter.word_count #=> 1
counter.words #=> ["Renée"]
# Count numbers only
counter = WordsCounted::Counter.new("Hello, Renée! 123", rexexp: /[0-9]/)
counter.word_count #=> 1
counter.words #=> ["123"]
The gem provides a bunch more useful methods.
If the 'word' in this case can be described as an alphanumeric sequence which can include '-' then the following solution may be appropriate (assuming that everything that doesn't match the 'word' pattern is a separator):
>> 'one-way street'.split(/[^-a-zA-Z]/).size
=> 2
>> 'one-way street'.split(/[^-a-zA-Z]/).each { |m| puts m }
one-way
street
=> ["one-way", "street"]
However, there are some other symbols that can be included in the regex - for example, ' to support the words like "it's".
This is pretty simplistic but does the job if you are typing words with spaces in between. It ends up counting numbers as well but I'm sure you could edit the code to not count numbers.
puts "enter a sentence to find its word length: "
word = gets
word = word.chomp
splits = word.split(" ")
target = splits.length.to_s
puts "your sentence is " + target + " words long"
The best way to do is to use split method.
split divides a string into sub-strings based on a delimiter, returning an array of the sub-strings.
split takes two parameters, namely; pattern and limit.
pattern is the delimiter over which the string is to be split into an array.
limit specifies the number of elements in the resulting array.
For more details, refer to Ruby Documentation: Ruby String documentation
str = "This is a string"
str.split(' ').size
#output: 4
The above code splits the string wherever it finds a space and hence it give the number of words in the string which is indirectly the size of the array.
The above solution is wrong, consider the following:
"one-way street"
You will get
["one-way","", "street"]
Use
'one-way street'.gsub(/[^-a-zA-Z]/, ' ').split.size
This splits words only on ASCII whitespace chars:
p " some word\nother\tword|word".strip.split(/\s+/).size #=> 4

a simple regexp validator

How do a I create a validator, which has these simple rules. An expression is valid if
it must start with a letter
it must end with a letter
it can contain a dash (minus sign), but not at start or end of the expression
^[a-zA-Z]+-?[a-zA-Z]+$
E.g.
def validate(whatever)
reg = /^[a-zA-Z]+-?[a-zA-Z]+$/
return (reg.match(whatever)) ? true : false;
end
/^[A-Za-z]+(-?[A-Za-z]+)?$/
this seems like what you want.
^ = match the start position
^[A-Za-z]+ = start position is followed by any at least one or more letters.
-? = is there zero or one hyphens (use "*" if there can be multiple hyphens in a row).
[A-Za-z]+ = hyphen is followed by one or more letters
(-?[A-Za-z]+)? = for the case that there is a single letter.
$= match the end position in the string.
xmammoth pretty much got it, with one minor problem. My solution is:
^[a-zA-Z]+\-?[a-zA-Z]+$
Note that the original question states, it can contain a dash. The question mark is needed after the dash to make sure that it is optional in the regex.
^[A-Za-z].*[A-Za-z]$
In other words: letter, anything, letter.
Might also want:
^[A-Za-z](.*[A-Za-z])?$
so that a single letter is also matched.
What I meant, to be able to create tags. For example: "Wild-things" or "something-wild" or "into-the-wild" or "in-wilderness" "my-wild-world" etc...
This regular expression matches sequences, that consist of one or more words of letters, that are concatenated by dashes.
^[a-zA-Z]+(?:-[a-zA-Z]+)*$
Well,
[A-Za-z].*[A-Za-z]
According to your rules that will work. It will match anything that:
starts with a letter
ends with a letter
can contain a dash (among everything else) in between.

Resources